Fine-grained tokens

Reprocessing produces fine-grained tokens.

Each fine-grained token has an extent, which is a sequence of characters taken from the input.

Each fine-grained token has a kind, and possibly also some attributes, as described in the tables below.

KindAttributes
Whitespace
LineCommentstyle, body
BlockCommentstyle, body
Punctuationmark
Identifierrepresented identifier
RawIdentifierrepresented identifier
LifetimeOrLabelname
RawLifetimeOrLabelname
CharacterLiteralrepresented character, suffix
ByteLiteralrepresented byte, suffix
StringLiteralrepresented string, suffix
RawStringLiteralrepresented string, suffix
ByteStringLiteralrepresented bytes, suffix
RawByteStringLiteralrepresented bytes, suffix
CStringLiteralrepresented bytes, suffix
RawCStringLiteralrepresented bytes, suffix
IntegerLiteralbase, digits, suffix
FloatLiteralbody, suffix

These attributes have the following types:

AttributeType
basebinary / octal / decimal / hexadecimal
bodysequence of characters
digitssequence of characters
marksingle character
namesequence of characters
represented bytesingle byte
represented bytessequence of bytes
represented charactersingle character
represented identifiersequence of characters
represented stringsequence of characters
stylenon-doc / inner doc / outer doc
suffixsequence of characters

Notes:

At this stage:

  • Both _ and keywords are treated as instances of Identifier.
  • There are explicit tokens representing whitespace and comments.
  • Single-character tokens are used for all punctuation.
  • A lifetime (or label) is represented as a single token (which includes the leading ').