Fine-grained tokens
Reprocessing produces fine-grained tokens.
Each fine-grained token has an extent, which is a sequence of characters taken from the input.
Each fine-grained token has a kind, and possibly also some attributes, as described in the tables below.
Kind | Attributes |
---|---|
Whitespace | |
LineComment | style, body |
BlockComment | style, body |
Punctuation | mark |
Identifier | represented identifier |
RawIdentifier | represented identifier |
LifetimeOrLabel | name |
RawLifetimeOrLabel | name |
CharacterLiteral | represented character, suffix |
ByteLiteral | represented byte, suffix |
StringLiteral | represented string, suffix |
RawStringLiteral | represented string, suffix |
ByteStringLiteral | represented bytes, suffix |
RawByteStringLiteral | represented bytes, suffix |
CStringLiteral | represented bytes, suffix |
RawCStringLiteral | represented bytes, suffix |
IntegerLiteral | base, digits, suffix |
FloatLiteral | body, suffix |
These attributes have the following types:
Attribute | Type |
---|---|
base | binary / octal / decimal / hexadecimal |
body | sequence of characters |
digits | sequence of characters |
mark | single character |
name | sequence of characters |
represented byte | single byte |
represented bytes | sequence of bytes |
represented character | single character |
represented identifier | sequence of characters |
represented string | sequence of characters |
style | non-doc / inner doc / outer doc |
suffix | sequence of characters |
Notes:
At this stage:
- Both _ and keywords are treated as instances of
Identifier
. - There are explicit tokens representing whitespace and comments.
- Single-character tokens are used for all punctuation.
- A lifetime (or label) is represented as a single token (which includes the leading ').