Numeric literal tokens

Float literal
Reserved float
Integer literal

The following nonterminals are common to the definitions below:

Grammar

DECIMAL_DIGITS = { ('0'..'9' | "_") * }
HEXADECIMAL_DIGITS = { ('0'..'9' | 'a'..'f' | 'A'..'F' | "_") * }
LOW_BASE_TOKEN_DIGITS = { DECIMAL_DIGITS }
DECIMAL_PART = { '0'..'9' ~ DECIMAL_DIGITS }

SUFFIX = { IDENT }
IDENT = { IDENT_START ~ XID_CONTINUE * }
IDENT_START = { XID_START | "_" }

Float literal

Grammar

Float_literal = {
    FLOAT_BODY_WITH_EXPONENT ~ SUFFIX ? |
    FLOAT_BODY_WITHOUT_EXPONENT ~ !("e"|"E") ~ SUFFIX ? |
    FLOAT_BODY_WITH_FINAL_DOT ~ !"." ~ !IDENT_START
}

FLOAT_BODY_WITH_EXPONENT = {
    DECIMAL_PART ~ ("." ~ DECIMAL_PART ) ? ~
    ("e"|"E") ~ ("+"|"-") ? ~ EXPONENT_DIGITS
}
EXPONENT_DIGITS = { "_" * ~ '0'..'9' ~ DECIMAL_DIGITS }

FLOAT_BODY_WITHOUT_EXPONENT = {
    DECIMAL_PART ~ "." ~ DECIMAL_PART
}

FLOAT_BODY_WITH_FINAL_DOT = {
    DECIMAL_PART ~ "."
}

Note: The ! "." subexpression makes sure that forms like 1..2 aren't treated as starting with a float. The ! IDENT_START subexpression makes sure that forms like 1.some_method() aren't treated as starting with a float.

Attributes

The token's body is FLOAT_BODY_WITH_EXPONENT, FLOAT_BODY_WITHOUT_EXPONENT, or FLOAT_BODY_WITH_FINAL_DOT, whichever one participated in the match.

The token's suffix is SUFFIX, or empty if SUFFIX did not participate in the match.

Rejection

No matches are rejected.

Reserved float

Grammar

Reserved_float = {
    RESERVED_FLOAT_EMPTY_EXPONENT | RESERVED_FLOAT_BASED
}
RESERVED_FLOAT_EMPTY_EXPONENT = {
    DECIMAL_PART ~ ("." ~ DECIMAL_PART ) ? ~
    ("e"|"E") ~ ("+"|"-") ?
}
RESERVED_FLOAT_BASED = {
    (
        ("0b" | "0o") ~ LOW_BASE_TOKEN_DIGITS |
        "0x" ~ HEXADECIMAL_DIGITS
    )  ~  (
        ("e"|"E") |
        "." ~ !"." ~ !IDENT_START
    )
}

Rejection

All matches are rejected.

Integer literal

Grammar

Integer_literal = {
    ( INTEGER_BINARY_LITERAL |
      INTEGER_OCTAL_LITERAL |
      INTEGER_HEXADECIMAL_LITERAL |
      INTEGER_DECIMAL_LITERAL ) ~
    SUFFIX_NO_E ?
}

INTEGER_BINARY_LITERAL = { "0b" ~ LOW_BASE_TOKEN_DIGITS }
INTEGER_OCTAL_LITERAL = { "0o" ~ LOW_BASE_TOKEN_DIGITS }
INTEGER_HEXADECIMAL_LITERAL = { "0x" ~ HEXADECIMAL_DIGITS }
INTEGER_DECIMAL_LITERAL = { DECIMAL_PART }

SUFFIX_NO_E = { !("e"|"E") ~ SUFFIX }

Note: See rfc0879 for the reason we accept all decimal digits in binary and octal tokens; the inappropriate digits cause the token to be rejected.

Note: The INTEGER_DECIMAL_LITERAL nonterminal is listed last in the Integer_literal definition in order to resolve ambiguous cases like the following:

0b1e2 (which isn't 0 with suffix b1e2)

0b0123 (which is rejected, not accepted as 0 with suffix b0123)

0xy (which is rejected, not accepted as 0 with suffix xy)

0x· (which is rejected, not accepted as 0 with suffix x·)

Attributes

The token's base is looked up in the following table, depending on which nonterminal participated in the match:


`INTEGER_BINARY_LITERAL`	binary
`INTEGER_OCTAL_LITERAL`	octal
`INTEGER_HEXADECIMAL_LITERAL`	hexadecimal
`INTEGER_DECIMAL_LITERAL`	decimal

The token's digits are LOW_BASE_TOKEN_DIGITS, HEXADECIMAL_DIGITS, or DECIMAL_PART, whichever one participated in the match.

The token's suffix is SUFFIX, or empty if SUFFIX did not participate in the match.

Rejection

The match is rejected if:

the token's digits would consist entirely of _ characters; or
the token's base would be binary and its digits would contain any character other than 0, 1, or _; or
the token's base would be octal and its digits would contain any character other than 0, 1, 2, 3, 4, 5, 6, 7, or _.

Note: In particular, a match which would make an Integer_literal with empty digits is rejected.

Keyboard shortcuts

Writeup of Rust's lexer