Fine-grained tokens
Reprocessing produces fine-grained tokens.
Each fine-grained token has an extent, which is a sequence of characters taken from the input.
Each fine-grained token has a kind, and possibly also some attributes, as described in the tables below.
| Kind | Attributes |
|---|---|
Whitespace | |
LineComment | style, body |
BlockComment | style, body |
Punctuation | mark |
Identifier | represented identifier |
RawIdentifier | represented identifier |
LifetimeOrLabel | name |
RawLifetimeOrLabel | name |
CharacterLiteral | represented character, suffix |
ByteLiteral | represented byte, suffix |
StringLiteral | represented string, suffix |
RawStringLiteral | represented string, suffix |
ByteStringLiteral | represented bytes, suffix |
RawByteStringLiteral | represented bytes, suffix |
CStringLiteral | represented bytes, suffix |
RawCStringLiteral | represented bytes, suffix |
IntegerLiteral | base, digits, suffix |
FloatLiteral | body, suffix |
These attributes have the following types:
| Attribute | Type |
|---|---|
| base | binary / octal / decimal / hexadecimal |
| body | sequence of characters |
| digits | sequence of characters |
| mark | single character |
| name | sequence of characters |
| represented byte | single byte |
| represented bytes | sequence of bytes |
| represented character | single character |
| represented identifier | sequence of characters |
| represented string | sequence of characters |
| style | non-doc / inner doc / outer doc |
| suffix | sequence of characters |
Notes:
At this stage:
- Both _ and keywords are treated as instances of
Identifier. - There are explicit tokens representing whitespace and comments.
- Single-character tokens are used for all punctuation.
- A lifetime (or label) is represented as a single token (which includes the leading ').