Table of contents

Whitespace
Line comment
Block comment
Unterminated block comment
Reserved hash forms (Rust 2024)
Punctuation
Single-quoted literal
Raw lifetime or label (Rust 2021 and 2024)
Reserved lifetime or label prefix (Rust 2021 and 2024)
Non-raw lifetime or label
Double-quoted non-raw literal (Rust 2015 and 2018)
Double-quoted non-raw literal (Rust 2021 and 2024)
Double-quoted hashless raw literal (Rust 2015 and 2018)
Double-quoted hashless raw literal (Rust 2021 and 2024)
Double-quoted hashed raw literal (Rust 2015 and 2018)
Double-quoted hashed raw literal (Rust 2021 and 2024)
Float literal with signed exponent
Float literal with signless exponent
Float literal without exponent
Float literal with final dot
Integer binary literal
Integer octal literal
Integer hexadecimal literal
Integer decimal literal
Raw identifier
Unterminated literal (Rust 2015 and 2018)
Reserved prefix or unterminated literal (Rust 2021 and 2024)
Non-raw identifier

The list of pretokenisation rules

The list of pretokenisation rules is given below.

Rules whose names indicate one or more editions are included in the list only when one of those editions is in effect.

Unless otherwise stated, a rule has no constraint and has an empty set of forbidden followers.

When an attribute value is given below as "captured characters", the value of that attribute is the sequence of characters captured by the capture group in the pattern whose name is the same as the attribute's name.

Whitespace

Pattern
[ \p{Pattern_White_Space} ] +
Pretoken kind

Whitespace

Attributes

(none)

Line comment

Pattern
/ /
(?<comment_content>
  [^ \n] *
)
Pretoken kind

LineComment

Attributes
comment contentcaptured characters

Block comment

Pattern
/ \*
(?<comment_content>
  . *
)
\* /
Constraint

The constraint is satisfied if (and only if) the following block of Rust code evaluates to true, when character_sequence represents an iterator over the sequence of characters being tested against the constraint.

#![allow(unused)]
fn main() {
{
    let mut depth = 0_isize;
    let mut after_slash = false;
    let mut after_star = false;
    for c in character_sequence {
        match c {
            '*' if after_slash => {
                depth += 1;
                after_slash = false;
            }
            '/' if after_star => {
                depth -= 1;
                after_star = false;
            }
            _ => {
                after_slash = c == '/';
                after_star = c == '*';
            }
        }
    }
    depth == 0
}
}
Pretoken kind

BlockComment

Attributes
comment contentcaptured characters

See also Defining the block-comment constraint

Unterminated block comment

Pattern
/ \*
Pretoken kind

Reserved

Attributes

(none)

Reserved hash forms (Rust 2024)

Pattern
\#
( \# | " )
Pretoken kind

Reserved

Attributes

(none)

Punctuation

Pattern
[
  ; , \. \( \) \{ \} \[ \] @ \# ~ \? : \$ = ! < > \- & \| \+ \* / ^ %
]
Pretoken kind

Punctuation

Attributes
markthe single character matched by the pattern

Note: When this pattern matches, the matched character sequence is necessarily one character long.

Single-quoted literal

Pattern
(?<prefix>
  b ?
)
'
(?<literal_content>
  [^ \\ ' ]
|
  \\ . [^']*
)
'
(?<suffix>
  (?:
    [ \p{XID_Start} _ ]
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

SingleQuoteLiteral

Attributes
prefixcaptured characters
literal contentcaptured characters
suffixcaptured characters

Raw lifetime or label (Rust 2021 and 2024)

Pattern
' r \#
(?<name>
  [ \p{XID_Start} _ ]
  \p{XID_Continue} *
)

Forbidden followers:

  • The character '
Pretoken kind

RawLifetimeOrLabel

Attributes
namecaptured characters

Reserved lifetime or label prefix (Rust 2021 and 2024)

Pattern
'
[ \p{XID_Start} _ ]
\p{XID_Continue} *
\#
Pretoken kind

Reserved

Attributes

(none)

Non-raw lifetime or label

Pattern
'
(?<name>
  [ \p{XID_Start} _ ]
  \p{XID_Continue} *
)

Forbidden followers:

  • The character '
Pretoken kind

LifetimeOrLabel

Attributes
namecaptured characters

Note: the forbidden follower here makes sure that forms like 'aaa'bbb are not accepted.

Double-quoted non-raw literal (Rust 2015 and 2018)

Pattern
(?<prefix>
  b ?
)
"
(?<literal_content>
  (?:
    [^ \\ " ]
  |
    \\ .
  ) *
)
"
(?<suffix>
  (?:
    [ \p{XID_Start} _ ]
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

DoubleQuoteLiteral

Attributes
prefixcaptured characters
literal contentcaptured characters
suffixcaptured characters

Double-quoted non-raw literal (Rust 2021 and 2024)

Pattern
(?<prefix>
  [bc] ?
)
"
(?<literal_content>
  (?:
    [^ \\ " ]
  |
    \\ .
  ) *
)
"
(?<suffix>
  (?:
    [ \p{XID_Start} _ ]
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

DoubleQuoteLiteral

Attributes
prefixcaptured characters
literal contentcaptured characters
suffixcaptured characters

Note: the difference between the 2015/2018 and 2021/2024 patterns is that the 2021/2024 pattern allows c as a prefix.

Double-quoted hashless raw literal (Rust 2015 and 2018)

Pattern
(?<prefix>
  r | br
)
"
(?<literal_content>
  [^"] *
)
"
(?<suffix>
  (?:
    [ \p{XID_Start} _ ]
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

RawDoubleQuoteLiteral

Attributes
prefixcaptured characters
literal contentcaptured characters
suffixcaptured characters

Double-quoted hashless raw literal (Rust 2021 and 2024)

Pattern
(?<prefix>
  r | br | cr
)
"
(?<literal_content>
  [^"] *
)
"
(?<suffix>
  (?:
    [ \p{XID_Start} _ ]
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

RawDoubleQuoteLiteral

Attributes
prefixcaptured characters
literal contentcaptured characters
suffixcaptured characters

Note: the difference between the 2015/2018 and and 2021/2024 patterns is that the 2021/2024 pattern allows cr as a prefix.

Note: we can't treat the hashless rule as a special case of the hashed one because the "shortest maximal match" rule doesn't work without hashes (consider r"x"").

Double-quoted hashed raw literal (Rust 2015 and 2018)

Pattern
(?<prefix>
  r | br
)
(?<hashes_1>
  \# {1,255}
)
"
(?<literal_content>
  . *
)
"
(?<hashes_2>
  \# {1,255}
)
(?<suffix>
  (?:
    [ \p{XID_Start} _ ]
    \p{XID_Continue} *
  ) ?
)
Constraint

The constraint is satisfied if (and only if) the character sequence captured by the hashes_1 capture group is equal to the character sequence captured by the hashes_2 capture group.

Pretoken kind

RawDoubleQuoteLiteral

Attributes
prefixcaptured characters
literal contentcaptured characters
suffixcaptured characters

Double-quoted hashed raw literal (Rust 2021 and 2024)

Pattern
(?<prefix>
  r | br | cr
)
(?<hashes_1>
  \# {1,255}
)
"
(?<literal_content>
  . *
)
"
(?<hashes_2>
  \# {1,255}
)
(?<suffix>
  (?:
    [ \p{XID_Start} _ ]
    \p{XID_Continue} *
  ) ?
)
Constraint

The constraint is satisfied if (and only if) the character sequence captured by the hashes_1 capture group is equal to the character sequence captured by the hashes_2 capture group.

Pretoken kind

RawDoubleQuoteLiteral

Attributes
prefixcaptured characters
literal contentcaptured characters
suffixcaptured characters

Note: the difference between the 2015/2018 and 2021/2024 patterns is that the 2021/2024 pattern allows cr as a prefix.

Float literal with signed exponent

Pattern
(?<body>
  (?:
    (?<based>
      (?: 0b | 0o )
      [ 0-9 _ ] *
    )
  |
    [ 0-9 ]
    [ 0-9 _ ] *
  )
  (?:
    \.
    [ 0-9 ]
    [ 0-9 _ ] *
  ) ?
  [eE]
  [+-]
  (?<exponent_digits>
    [ 0-9 _ ] *
  )
)
(?<suffix>
  (?:
    [ \p{XID_Start} ]
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

FloatLiteral

Attributes
has basetrue if the based capture group participates in the match,
false otherwise
bodycaptured characters
exponent digitscaptured characters
suffixcaptured characters

Float literal with signless exponent

Pattern
(?<body>
  (?:
    (?<based>
      (?: 0b | 0o )
      [ 0-9 _ ] *
    )
  |
    [ 0-9 ]
    [ 0-9 _ ] *
  )
  (?:
    \.
    [ 0-9 ]
    [ 0-9 _ ] *
  ) ?
  [eE]
  (?<exponent_digits>
    _ *
    [ 0-9 ]
    [ 0-9 _ ] *
  )
)
(?<suffix>
  (?:
    [ \p{XID_Start} ]
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

FloatLiteral

Attributes
has basetrue if the based capture group participates in the match,
false otherwise
bodycaptured characters
exponent digitscaptured characters
suffixcaptured characters

Float literal without exponent

Pattern
(?<body>
  (?:
    (?<based>
      (?: 0b | 0o )
      [ 0-9 _ ] *
    |
      0x
      [ 0-9 a-f A-F _ ] *
    )
  |
    [ 0-9 ]
    [ 0-9 _ ] *
  )
  \.
  [ 0-9 ]
  [ 0-9 _ ] *
)
(?<suffix>
  (?:
    \p{XID_Start}
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

FloatLiteral

Attributes
has basetrue if the based capture group participates in the match,
false otherwise
bodycaptured characters
exponent digitsnone
suffixcaptured characters

Float literal with final dot

Pattern
(?:
  (?<based>
    (?: 0b | 0o )
    [ 0-9 _ ] *
  |
    0x
    [ 0-9 a-f A-F _ ] *
  )
|
  [ 0-9 ]
  [ 0-9 _ ] *
)
\.

Forbidden followers:

  • The character _
  • The character .
  • The characters with the Unicode property XID_start
Pretoken kind

FloatLiteral

Attributes
has basetrue if the based capture group participates in the match,
false otherwise
bodythe entire character sequence matched by the pattern
exponent digitsnone
suffixempty character sequence

Integer binary literal

Pattern
0b
(?<digits>
  [ 0-9 _ ] *
)
(?<suffix>
  (?:
    \p{XID_Start}
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

IntegerBinaryLiteral

Attributes
digitscaptured characters
suffixcaptured characters

Integer octal literal

Pattern
0o
(?<digits>
  [ 0-9 _ ] *
)
(?<suffix>
  (?:
    \p{XID_Start}
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

IntegerOctalLiteral

Attributes
digitscaptured characters
suffixcaptured characters

Integer hexadecimal literal

Pattern
0x
(?<digits>
  [ 0-9 a-f A-F _ ] *
)
(?<suffix>
  (?:
    [ \p{XID_Start} -- aAbBcCdDeEfF]
    \p{XID_Continue} *
  ) ?
)
Pretoken kind

IntegerHexadecimalLiteral

Attributes
digitscaptured characters
suffixcaptured characters

Integer decimal literal

Pattern
(?<digits>
  [ 0-9 ]
  [ 0-9 _ ] *
)
(?<suffix>
  (?:
    \p{XID_Start}
    \p{XID_Continue} *
  ) ?
)
digitscaptured characters
suffixcaptured characters
Pretoken kind

IntegerDecimalLiteral

Attributes

Note: it is important that this rule has lower priority than the other numeric literal rules. See Integer literal base-vs-suffix ambiguity.

Raw identifier

Pattern
r \#
(?<identifier>
  [ \p{XID_Start} _ ]
  \p{XID_Continue} *
)
Pretoken kind

RawIdentifier

Attributes
identifiercaptured characters

Unterminated literal (Rust 2015 and 2018)

Pattern
( r \# | b r \# | r " | b r " | b ' )

Note: I believe the double-quoted forms here aren't strictly needed: if this rule is chosen when its pattern matched via one of those forms then the input must be rejected eventually anyway.

Pretoken kind

Reserved

Attributes

(none)

Reserved prefix or unterminated literal (Rust 2021 and 2024)

Pattern
[ \p{XID_Start} _ ]
\p{XID_Continue} *
( \# | " | ' )
Pretoken kind

Reserved

Attributes

(none)

Non-raw identifier

Pattern
(?<identifier>
  [ \p{XID_Start} _ ]
  \p{XID_Continue} *
)
Pretoken kind

Identifier

Attributes
identifiercaptured characters

Note: this is following the specification in Unicode Standard Annex #31 for Unicode version 16.0, with the addition of permitting underscore as the first character.