Table of contents
Whitespace
Line comment
Block comment
Unterminated block comment
Reserved hash forms (Rust 2024)
Punctuation
Single-quoted literal
Raw lifetime or label (Rust 2021 and 2024)
Reserved lifetime or label prefix (Rust 2021 and 2024)
Non-raw lifetime or label
Double-quoted non-raw literal (Rust 2015 and 2018)
Double-quoted non-raw literal (Rust 2021 and 2024)
Double-quoted hashless raw literal (Rust 2015 and 2018)
Double-quoted hashless raw literal (Rust 2021 and 2024)
Double-quoted hashed raw literal (Rust 2015 and 2018)
Double-quoted hashed raw literal (Rust 2021 and 2024)
Float literal with exponent
Float literal without exponent
Float literal with final dot
Integer binary literal
Integer octal literal
Integer hexadecimal literal
Integer decimal literal
Raw identifier
Unterminated literal (Rust 2015 and 2018)
Reserved prefix or unterminated literal (Rust 2021 and 2024)
Non-raw identifier
The list of pretokenisation rules
The list of pretokenisation rules is given below.
Rules whose names indicate one or more editions are included in the list only when one of those editions is in effect.
Unless otherwise stated, a rule has no constraint and has an empty set of forbidden followers.
When an attribute value is given below as "captured characters", the value of that attribute is the sequence of characters captured by the capture group in the pattern whose name is the same as the attribute's name.
Whitespace
Pattern
[ \p{Pattern_White_Space} ] +
Pretoken kind
Whitespace
Attributes
(none)
Line comment
Pattern
/ /
(?<comment_content>
[^ \n] *
)
Pretoken kind
LineComment
Attributes
comment content | captured characters |
Block comment
Pattern
/ \*
(?<comment_content>
. *
)
\* /
Constraint
The constraint is satisfied if (and only if) the following block of Rust code evaluates to true
,
when character_sequence
represents an iterator over the sequence of characters being tested against the constraint.
#![allow(unused)] fn main() { { let mut depth = 0_isize; let mut after_slash = false; let mut after_star = false; for c in character_sequence { match c { '*' if after_slash => { depth += 1; after_slash = false; } '/' if after_star => { depth -= 1; after_star = false; } _ => { after_slash = c == '/'; after_star = c == '*'; } } } depth == 0 } }
Pretoken kind
BlockComment
Attributes
comment content | captured characters |
Unterminated block comment
Pattern
/ \*
Pretoken kind
Reserved
Attributes
(none)
Reserved hash forms (Rust 2024)
Pattern
\#
( \# | " )
Pretoken kind
Reserved
Attributes
(none)
Punctuation
Pattern
[
; , \. \( \) \{ \} \[ \] @ \# ~ \? : \$ = ! < > \- & \| \+ \* / ^ %
]
Pretoken kind
Punctuation
Attributes
mark | the single character matched by the pattern |
Note: When this pattern matches, the matched character sequence is necessarily one character long.
Single-quoted literal
Pattern
(?<prefix>
b ?
)
'
(?<literal_content>
[^ \\ ' ]
|
\\ . [^']*
)
'
(?<suffix>
(?:
[ \p{XID_Start} _ ]
\p{XID_Continue} *
) ?
)
Pretoken kind
SingleQuoteLiteral
Attributes
prefix | captured characters |
literal content | captured characters |
suffix | captured characters |
Raw lifetime or label (Rust 2021 and 2024)
Pattern
' r \#
(?<name>
[ \p{XID_Start} _ ]
\p{XID_Continue} *
)
Forbidden followers:
- The character '
Pretoken kind
RawLifetimeOrLabel
Attributes
name | captured characters |
Reserved lifetime or label prefix (Rust 2021 and 2024)
Pattern
'
[ \p{XID_Start} _ ]
\p{XID_Continue} *
\#
Pretoken kind
Reserved
Attributes
(none)
Non-raw lifetime or label
Pattern
'
(?<name>
[ \p{XID_Start} _ ]
\p{XID_Continue} *
)
Forbidden followers:
- The character '
Pretoken kind
LifetimeOrLabel
Attributes
name | captured characters |
Note: the forbidden follower here makes sure that forms like
'aaa'bbb
are not accepted.
Double-quoted non-raw literal (Rust 2015 and 2018)
Pattern
(?<prefix>
b ?
)
"
(?<literal_content>
(?:
[^ \\ " ]
|
\\ .
) *
)
"
(?<suffix>
(?:
[ \p{XID_Start} _ ]
\p{XID_Continue} *
) ?
)
Pretoken kind
DoubleQuoteLiteral
Attributes
prefix | captured characters |
literal content | captured characters |
suffix | captured characters |
Double-quoted non-raw literal (Rust 2021 and 2024)
Pattern
(?<prefix>
[bc] ?
)
"
(?<literal_content>
(?:
[^ \\ " ]
|
\\ .
) *
)
"
(?<suffix>
(?:
[ \p{XID_Start} _ ]
\p{XID_Continue} *
) ?
)
Pretoken kind
DoubleQuoteLiteral
Attributes
prefix | captured characters |
literal content | captured characters |
suffix | captured characters |
Note: the difference between the 2015/2018 and 2021/2024 patterns is that the 2021/2024 pattern allows
c
as a prefix.
Double-quoted hashless raw literal (Rust 2015 and 2018)
Pattern
(?<prefix>
r | br
)
"
(?<literal_content>
[^"] *
)
"
(?<suffix>
(?:
[ \p{XID_Start} _ ]
\p{XID_Continue} *
) ?
)
Pretoken kind
RawDoubleQuoteLiteral
Attributes
prefix | captured characters |
literal content | captured characters |
suffix | captured characters |
Double-quoted hashless raw literal (Rust 2021 and 2024)
Pattern
(?<prefix>
r | br | cr
)
"
(?<literal_content>
[^"] *
)
"
(?<suffix>
(?:
[ \p{XID_Start} _ ]
\p{XID_Continue} *
) ?
)
Pretoken kind
RawDoubleQuoteLiteral
Attributes
prefix | captured characters |
literal content | captured characters |
suffix | captured characters |
Note: the difference between the 2015/2018 and and 2021/2024 patterns is that the 2021/2024 pattern allows
cr
as a prefix.
Note: we can't treat the hashless rule as a special case of the hashed one because the "shortest maximal match" rule doesn't work without hashes (consider
r"x""
).
Double-quoted hashed raw literal (Rust 2015 and 2018)
Pattern
(?<prefix>
r | br
)
(?<hashes_1>
\# {1,255}
)
"
(?<literal_content>
. *
)
"
(?<hashes_2>
\# {1,255}
)
(?<suffix>
(?:
[ \p{XID_Start} _ ]
\p{XID_Continue} *
) ?
)
Constraint
The constraint is satisfied if (and only if) the character sequence captured by the hashes_1
capture group is equal to the character sequence captured by the hashes_2
capture group.
Pretoken kind
RawDoubleQuoteLiteral
Attributes
prefix | captured characters |
literal content | captured characters |
suffix | captured characters |
Double-quoted hashed raw literal (Rust 2021 and 2024)
Pattern
(?<prefix>
r | br | cr
)
(?<hashes_1>
\# {1,255}
)
"
(?<literal_content>
. *
)
"
(?<hashes_2>
\# {1,255}
)
(?<suffix>
(?:
[ \p{XID_Start} _ ]
\p{XID_Continue} *
) ?
)
Constraint
The constraint is satisfied if (and only if) the character sequence captured by the hashes_1
capture group is equal to the character sequence captured by the hashes_2
capture group.
Pretoken kind
RawDoubleQuoteLiteral
Attributes
prefix | captured characters |
literal content | captured characters |
suffix | captured characters |
Note: the difference between the 2015/2018 and 2021/2024 patterns is that the 2021/2024 pattern allows
cr
as a prefix.
Float literal with exponent
Pattern
(?<body>
(?:
(?<based>
(?: 0b | 0o )
[ 0-9 _ ] *
)
|
[ 0-9 ]
[ 0-9 _ ] *
)
(?:
\.
[ 0-9 ]
[ 0-9 _ ] *
) ?
[eE]
[+-] ?
(?<exponent_digits>
[ 0-9 _ ] *
)
)
(?<suffix>
(?:
[ \p{XID_Start} ]
\p{XID_Continue} *
) ?
)
Pretoken kind
FloatLiteral
Attributes
has base | true if the based capture group participates in the match,false otherwise |
body | captured characters |
exponent digits | captured characters |
suffix | captured characters |
Float literal without exponent
Pattern
(?<body>
(?:
(?<based>
(?: 0b | 0o )
[ 0-9 _ ] *
|
0x
[ 0-9 a-f A-F _ ] *
)
|
[ 0-9 ]
[ 0-9 _ ] *
)
\.
[ 0-9 ]
[ 0-9 _ ] *
)
(?<suffix>
(?:
[ \p{XID_Start} -- eE]
\p{XID_Continue} *
) ?
)
Pretoken kind
FloatLiteral
Attributes
has base | true if the based capture group participates in the match,false otherwise |
body | captured characters |
exponent digits | none |
suffix | captured characters |
Float literal with final dot
Pattern
(?:
(?<based>
(?: 0b | 0o )
[ 0-9 _ ] *
|
0x
[ 0-9 a-f A-F _ ] *
)
|
[ 0-9 ]
[ 0-9 _ ] *
)
\.
Forbidden followers:
- The character _
- The character .
- The characters with the Unicode property
XID_start
Pretoken kind
FloatLiteral
Attributes
has base | true if the based capture group participates in the match,false otherwise |
body | the entire character sequence matched by the pattern |
exponent digits | none |
suffix | empty character sequence |
Integer binary literal
Pattern
0b
(?<digits>
[ 0-9 _ ] *
)
(?<suffix>
(?:
[ \p{XID_Start} -- eE]
\p{XID_Continue} *
) ?
)
Pretoken kind
IntegerBinaryLiteral
Attributes
digits | captured characters |
suffix | captured characters |
Integer octal literal
Pattern
0o
(?<digits>
[ 0-9 _ ] *
)
(?<suffix>
(?:
[ \p{XID_Start} -- eE]
\p{XID_Continue} *
) ?
)
Pretoken kind
IntegerOctalLiteral
Attributes
digits | captured characters |
suffix | captured characters |
Integer hexadecimal literal
Pattern
0x
(?<digits>
[ 0-9 a-f A-F _ ] *
)
(?<suffix>
(?:
[ \p{XID_Start} -- aAbBcCdDeEfF]
\p{XID_Continue} *
) ?
)
Pretoken kind
IntegerHexadecimalLiteral
Attributes
digits | captured characters |
suffix | captured characters |
Integer decimal literal
Pattern
(?<digits>
[ 0-9 ]
[ 0-9 _ ] *
)
(?<suffix>
(?:
[ \p{XID_Start} -- eE]
\p{XID_Continue} *
) ?
)
digits | captured characters |
suffix | captured characters |
Pretoken kind
IntegerDecimalLiteral
Attributes
Note: it is important that this rule has lower priority than the other numeric literal rules. See Integer literal base-vs-suffix ambiguity.
Raw identifier
Pattern
r \#
(?<identifier>
[ \p{XID_Start} _ ]
\p{XID_Continue} *
)
Pretoken kind
RawIdentifier
Attributes
identifier | captured characters |
Unterminated literal (Rust 2015 and 2018)
Pattern
( r \# | b r \# | r " | b r " | b ' )
Note: I believe the double-quoted forms here aren't strictly needed: if this rule is chosen when its pattern matched via one of those forms then the input must be rejected eventually anyway.
Pretoken kind
Reserved
Attributes
(none)
Reserved prefix or unterminated literal (Rust 2021 and 2024)
Pattern
[ \p{XID_Start} _ ]
\p{XID_Continue} *
( \# | " | ' )
Pretoken kind
Reserved
Attributes
(none)
Non-raw identifier
Pattern
(?<identifier>
[ \p{XID_Start} _ ]
\p{XID_Continue} *
)
Pretoken kind
Identifier
Attributes
identifier | captured characters |
Note: this is following the specification in Unicode Standard Annex #31 for Unicode version 16.0, with the addition of permitting underscore as the first character.