Table of contents

Reserved
Whitespace
LineComment
BlockComment
Punctuation
Identifier
RawIdentifier
LifetimeOrLabel
RawLifetimeOrLabel
SingleQuoteLiteral
DoubleQuoteLiteral
RawDoubleQuoteLiteral
IntegerDecimalLiteral
IntegerHexadecimalLiteral
IntegerOctalLiteral
IntegerBinaryLiteral
FloatLiteral

The list of of reprocessing cases

The list below has an entry for each kind of pretoken, describing what kind of fine-grained token it produces, how the fine-grained token's attributes are determined, and the circumstances under which a pretoken is rejected.

When an attribute value is given below as "copied", it has the same value as the pretoken's attribute with the same name.

Reserved

A Reserved pretoken is always rejected.

Whitespace

Fine-grained token kind produced: Whitespace

A Whitespace pretoken is always accepted.

LineComment

Fine-grained token kind produced: LineComment

Attributes

style and body are determined from the pretoken's comment content as follows:

  • if the comment content begins with //:

    • style is non-doc
    • body is empty
  • otherwise, if the comment content begins with /,

    • style is outer doc
    • body is the characters from the comment content after that /
  • otherwise, if the comment content begins with !,

    • style is inner doc
    • body is the characters from the comment content after that !
  • otherwise

    • style is non-doc
    • body is empty

The pretoken is rejected if (and only if) the resulting body includes a CR character.

Note: the body of a non-doc comment is ignored by the rest of the compilation process

BlockComment

Fine-grained token kind produced: BlockComment

Attributes

style and body are determined from the pretoken's comment content as follows:

  • if the comment content begins with **:

    • style is non-doc
    • body is empty
  • otherwise, if the comment content begins with * and contains at least one further character,

    • style is outer doc
    • body is the characters from the comment content after that *
  • otherwise, if the comment content begins with !,

    • style is inner doc
    • body is the characters from the comment content after that !
  • otherwise

    • style is non-doc
    • body is empty

The pretoken is rejected if (and only if) the resulting body includes a CR character.

Note: it follows that /**/ and /***/ are not doc-comments

Note: the body of a non-doc comment is ignored by the rest of the compilation process

Punctuation

Fine-grained token kind produced: Punctuation

A Punctuation pretoken is always accepted.

Attributes

mark: copied

Identifier

Fine-grained token kind produced: Identifier

An Identifier pretoken is always accepted.

Attributes

represented identifier: NFC-normalised form of the pretoken's identifier

RawIdentifier

Fine-grained token kind produced: RawIdentifier

Attributes

represented identifier: NFC-normalised form of the pretoken's identifier

The pretoken is rejected if (and only if) the represented identifier is one of the following sequences of characters:

  • _
  • crate
  • self
  • super
  • Self

LifetimeOrLabel

Fine-grained token kind produced: LifetimeOrLabel

A LifetimeOrLabel pretoken is always accepted.

Attributes

name: copied

Note that the name is not NFC-normalised. See NFC normalisation for lifetime/label.

RawLifetimeOrLabel

Fine-grained token kind produced: RawLifetimeOrLabel

The pretoken is rejected if (and only if) the name is one of the following sequences of characters:

  • _
  • crate
  • self
  • super
  • Self
Attributes

name: copied

Note that the name is not NFC-normalised. See NFC normalisation for lifetime/label.

SingleQuoteLiteral

The pretokeniser guarantees the pretoken's prefix attribute is one of the following:

  • empty, in which case it is reprocessed as described under Character literal
  • the single character b, in which case it is reprocessed as described under Byte literal.

In either case, the pretoken is rejected if its suffix consists of the single character _.

Character literal

Fine-grained token kind produced: CharacterLiteral

Attributes

The represented character is derived from the pretoken's literal content as follows:

  • If the literal content is one of the following forms of escape sequence, the represented character is the escape sequence's escaped value:

  • If the literal content begins with a \ character which did not introduce one of the above forms of escape, the pretoken is rejected.

  • Otherwise, if the single character that makes up the literal content is LF, CR, or TAB, the pretoken is rejected.

  • Otherwise the represented character is the single character that makes up the literal content.

suffix: copied

Note: The protokeniser guarantees the pretoken's literal content is either a single character, or a character sequence beginning with \.

Byte literal

Fine-grained token kind produced: ByteLiteral

Attributes

Define a represented character, derived from the pretoken's literal content as follows:

  • If the literal content is one of the following forms of escape sequence, the represented character is the escape sequence's escaped value:

  • If the literal content begins with a \ character which did not introduce one of the above forms of escape, the pretoken is rejected.

  • Otherwise, if the single character that makes up the literal content is LF, CR, or TAB, the pretoken is rejected.

  • Otherwise, if the single character that makes up the literal content has a unicode scalar value greater than 127, the pretoken is rejected.

  • Otherwise the represented character is the single character that makes up the literal content.

The represented byte is the represented character's Unicode scalar value.

suffix: copied

Note: The protokeniser guarantees the pretoken's literal content is either a single character, or a character sequence beginning with \.

DoubleQuoteLiteral

The pretokeniser guarantees the pretoken's prefix attribute is one of the following:

  • empty, in which case it is reprocessed as described under String literal
  • the single character b, in which case it is reprocessed as described under Byte-string literal
  • the single character c, in which case it is reprocessed as described under C-string literal

In each case, the pretoken is rejected if its suffix consists of the single character _.

String literal

Fine-grained token kind produced: StringLiteral

Attributes

The represented string is derived from the pretoken's literal content by replacing each escape sequence of any of the following forms occurring in the literal content with the escape sequence's escaped value.

These replacements take place in left-to-right order. For example, the pretoken with extent "\\x41" is converted to the characters \ x 4 1.

If a \ character appears in the literal content but is not part of one of the above forms of escape, the pretoken is rejected.

If a CR character appears in the literal content and is not part of a string continuation escape, the pretoken is rejected.

suffix: copied

See Wording for string unescaping

Byte-string literal

Fine-grained token kind produced: ByteStringLiteral

If any character whose unicode scalar value is greater than 127 appears in the literal content, the pretoken is rejected.

Attributes

Define a represented string (a sequence of characters) derived from the pretoken's literal content by replacing each escape sequence of any of the following forms occurring in the literal content with the escape sequence's escaped value.

These replacements take place in left-to-right order. For example, the pretoken with extent b"\\x41" is converted to the characters \ x 4 1.

If a \ character appears in the literal content but is not part of one of the above forms of escape, the pretoken is rejected.

If a CR character appears in the literal content and is not part of a string continuation escape, the pretoken is rejected.

The represented bytes are the sequence of Unicode scalar values of the characters in the represented string.

suffix: copied

See Wording for string unescaping

C-string literal

Fine-grained token kind produced: CStringLiteral

Attributes

The pretoken's literal content is treated as a sequence of items, each of which is either a single Unicode character other than \ or an escape.

The sequence of items is converted to the represented bytes as follows:

If a \ character appears in the literal content but is not part of one of the above forms of escape, the pretoken is rejected.

If a CR character appears in the literal content and is not part of a string continuation escape, the pretoken is rejected.

If any of the resulting represented bytes have value 0, the pretoken is rejected.

suffix: copied

See Wording for string unescaping

RawDoubleQuoteLiteral

The pretokeniser guarantees the pretoken's prefix attribute is one of the following:

In each case, the pretoken is rejected if its suffix consists of the single character _.

Raw string literal

Fine-grained token kind produced: RawStringLiteral

The pretoken is rejected if (and only if) a CR character appears in the literal content.

Attributes

represented string: the pretoken's literal content

suffix: copied

Raw byte-string literal

Fine-grained token kind produced: RawByteStringLiteral

If any character whose unicode scalar value is greater than 127 appears in the literal content, the pretoken is rejected.

If a CR character appears in the literal content, the pretoken is rejected.

Attributes

represented bytes: the sequence of Unicode scalar values of the characters in the pretoken's literal content

suffix: copied

Raw C-string literal

Fine-grained token kind produced: RawCStringLiteral

If a CR character appears in the literal content, the pretoken is rejected.

Attributes

represented bytes: the UTF-8 encoding of the pretoken's literal content

suffix: copied

If any of the resulting represented bytes have value 0, the pretoken is rejected.

IntegerDecimalLiteral

Fine-grained token kind produced: IntegerLiteral

The pretoken is rejected if (and only if) its digits attribute consists entirely of _ characters.

Attributes

base: decimal

digits: copied

suffix: copied

Note: in particular, an IntegerDecimalLiteral whose digits is empty is rejected.

IntegerHexadecimalLiteral

Fine-grained token kind produced: IntegerLiteral

The pretoken is rejected if (and only if) its digits attribute consists entirely of _ characters.

Attributes

base: hexadecimal

digits: copied

suffix: copied

Note: in particular, an IntegerHexadecimalLiteral whose digits is empty is rejected.

IntegerOctalLiteral

Fine-grained token kind produced: IntegerLiteral

The pretoken is rejected if (and only if):

  • its digits attribute consists entirely of _ characters; or
  • its digits attribute contains any character other than 0, 1, 2, 3, 4, 5, 6, 7, or _.
Attributes

base: octal

digits: copied

suffix: copied

Note: in particular, an IntegerOctalLiteral whose digits is empty is rejected.

IntegerBinaryLiteral

Fine-grained token kind produced: IntegerLiteral

The pretoken is rejected if (and only if):

  • its digits attribute consists entirely of _ characters; or
  • its digits attribute contains any character other than 0, 1, or _.
Attributes

base: binary

digits: copied

suffix: copied

Note: in particular, an IntegerBinaryLiteral whose digits is empty is rejected.

FloatLiteral

Fine-grained token kind produced: FloatLiteral

The pretoken is rejected if (and only if)

  • its has base attribute is true; or
  • its exponent digits attribute is a character sequence which consists entirely of _ characters.
Attributes

body: copied

suffix: copied

Note: in particular, a FloatLiteral whose exponent digits is empty is rejected.