Fine-grained tokens
Tokenising produces fine-grained tokens.
Each fine-grained token has a kind, which is the name of one of the token-kind nonterminals. Most kinds of fine-grained token also have attributes, as described in the tables below.
| Kind | Attributes |
|---|---|
Whitespace | |
Line_comment | style, body |
Block_comment | style, body |
Punctuation | mark |
Ident | represented ident |
Raw_ident | represented ident |
Lifetime_or_label | name |
Raw_lifetime_or_label | name |
Character_literal | represented character, suffix |
Byte_literal | represented byte, suffix |
String_literal | represented string, suffix |
Raw_string_literal | represented string, suffix |
Byte_string_literal | represented bytes, suffix |
Raw_byte_string_literal | represented bytes, suffix |
C_string_literal | represented bytes, suffix |
Raw_c_string_literal | represented bytes, suffix |
Integer_literal | base, digits, suffix |
Float_literal | body, suffix |
Note: Some token-kind nonterminals do not appear in this table. These are the reserved forms, whose matches are always rejected. The names of reserved forms begin with
Reserved_orUnterminated_.
These attributes have the following types:
| Attribute | Type |
|---|---|
| base | binary / octal / decimal / hexadecimal |
| body | sequence of characters |
| digits | sequence of characters |
| mark | single character |
| name | sequence of characters |
| represented byte | single byte |
| represented bytes | sequence of bytes |
| represented character | single character |
| represented ident | sequence of characters |
| represented string | sequence of characters |
| style | non-doc / inner doc / outer doc |
| suffix | sequence of characters |
Note: At this stage
- Both _ and keywords are treated as instances of
Ident.- There are explicit tokens representing whitespace and comments.
- Single-character tokens are used for all punctuation.
- A lifetime (or label) is represented as a single token (which includes the leading ').