Numeric literal tokens
Table of contents
The following nonterminals are common to the definitions below:
Grammar
DECIMAL_DIGITS = { ('0'..'9' | "_") * }
HEXADECIMAL_DIGITS = { ('0'..'9' | 'a' .. 'f' | 'A' .. 'F' | "_") * }
LOW_BASE_TOKEN_DIGITS = { DECIMAL_DIGITS }
DECIMAL_PART = { '0'..'9' ~ DECIMAL_DIGITS }
SUFFIX = { IDENT }
IDENT = { IDENT_START ~ XID_CONTINUE * }
IDENT_START = { XID_START | "_" }
Float literal
Grammar
Float_literal = {
FLOAT_BODY_WITH_EXPONENT ~ SUFFIX ? |
FLOAT_BODY_WITHOUT_EXPONENT ~ !("e"|"E") ~ SUFFIX ? |
FLOAT_BODY_WITH_FINAL_DOT ~ !"." ~ !IDENT_START
}
FLOAT_BODY_WITH_EXPONENT = {
DECIMAL_PART ~ ("." ~ DECIMAL_PART ) ? ~
("e"|"E") ~ ("+"|"-") ? ~ EXPONENT_DIGITS
}
EXPONENT_DIGITS = { "_" * ~ '0'..'9' ~ DECIMAL_DIGITS }
FLOAT_BODY_WITHOUT_EXPONENT = {
DECIMAL_PART ~ "." ~ DECIMAL_PART
}
FLOAT_BODY_WITH_FINAL_DOT = {
DECIMAL_PART ~ "."
}
Note: The
! "."subexpression makes sure that forms like1..2aren't treated as starting with a float. The! IDENT_STARTsubexpression makes sure that forms like1.some_method()aren't treated as starting with a float.
Attributes
The token's body is FLOAT_BODY_WITH_EXPONENT, FLOAT_BODY_WITHOUT_EXPONENT, or FLOAT_BODY_WITH_FINAL_DOT, whichever one participated in the match.
The token's suffix is SUFFIX, or empty if SUFFIX did not participate in the match.
Rejection
No matches are rejected.
Reserved float
Grammar
Reserved_float = {
RESERVED_FLOAT_EMPTY_EXPONENT | RESERVED_FLOAT_BASED
}
RESERVED_FLOAT_EMPTY_EXPONENT = {
DECIMAL_PART ~ ("." ~ DECIMAL_PART ) ? ~
("e"|"E") ~ ("+"|"-") ?
}
RESERVED_FLOAT_BASED = {
(
("0b" | "0o") ~ LOW_BASE_TOKEN_DIGITS |
"0x" ~ HEXADECIMAL_DIGITS
) ~ (
("e"|"E") |
"." ~ !"." ~ !IDENT_START
)
}
Rejection
All matches are rejected.
Integer literal
Grammar
Integer_literal = {
( INTEGER_BINARY_LITERAL |
INTEGER_OCTAL_LITERAL |
INTEGER_HEXADECIMAL_LITERAL |
INTEGER_DECIMAL_LITERAL ) ~
SUFFIX_NO_E ?
}
INTEGER_BINARY_LITERAL = { "0b" ~ LOW_BASE_TOKEN_DIGITS }
INTEGER_OCTAL_LITERAL = { "0o" ~ LOW_BASE_TOKEN_DIGITS }
INTEGER_HEXADECIMAL_LITERAL = { "0x" ~ HEXADECIMAL_DIGITS }
INTEGER_DECIMAL_LITERAL = { DECIMAL_PART }
SUFFIX_NO_E = { !("e"|"E") ~ SUFFIX }
Note: See rfc0879 for the reason we accept all decimal digits in binary and octal tokens; the inappropriate digits cause the token to be rejected.
Note: The
INTEGER_DECIMAL_LITERALnonterminal is listed last in theInteger_literaldefinition in order to resolve ambiguous cases like the following:
0b1e2(which isn't0with suffixb1e2)0b0123(which is rejected, not accepted as0with suffixb0123)0xy(which is rejected, not accepted as0with suffixxy)0x·(which is rejected, not accepted as0with suffixx·)
Attributes
The token's base is looked up in the following table, depending on which nonterminal participated in the match:
INTEGER_BINARY_LITERAL | binary |
INTEGER_OCTAL_LITERAL | octal |
INTEGER_HEXADECIMAL_LITERAL | hexadecimal |
INTEGER_DECIMAL_LITERAL | decimal |
The token's digits are LOW_BASE_TOKEN_DIGITS, HEXADECIMAL_DIGITS, or DECIMAL_PART, whichever one participated in the match.
The token's suffix is SUFFIX, or empty if SUFFIX did not participate in the match.
Rejection
The match is rejected if:
- the token's digits would consist entirely of _ characters; or
- the token's base would be binary and its digits would contain any character other than 0, 1, or _; or
- the token's base would be octal and its digits would contain any character other than 0, 1, 2, 3, 4, 5, 6, 7, or _.
Note: In particular, a match which would make an
Integer_literalwith empty digits is rejected.