Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Fine-grained tokens

Tokenising produces fine-grained tokens.

Each fine-grained token has a kind, which is the name of one of the token-kind nonterminals. Most kinds of fine-grained token also have attributes, as described in the tables below.

KindAttributes
Whitespace
Line_commentstyle, body
Block_commentstyle, body
Punctuationmark
Identrepresented ident
Raw_identrepresented ident
Lifetime_or_labelname
Raw_lifetime_or_labelname
Character_literalrepresented character, suffix
Byte_literalrepresented byte, suffix
String_literalrepresented string, suffix
Raw_string_literalrepresented string, suffix
Byte_string_literalrepresented bytes, suffix
Raw_byte_string_literalrepresented bytes, suffix
C_string_literalrepresented bytes, suffix
Raw_c_string_literalrepresented bytes, suffix
Integer_literalbase, digits, suffix
Float_literalbody, suffix

Note: Some token-kind nonterminals do not appear in this table. These are the reserved forms, whose matches are always rejected. The names of reserved forms begin with Reserved_ or Unterminated_.

These attributes have the following types:

AttributeType
basebinary / octal / decimal / hexadecimal
bodysequence of characters
digitssequence of characters
marksingle character
namesequence of characters
represented bytesingle byte
represented bytessequence of bytes
represented charactersingle character
represented identsequence of characters
represented stringsequence of characters
stylenon-doc / inner doc / outer doc
suffixsequence of characters

Note: At this stage

  • Both _ and keywords are treated as instances of Ident.
  • There are explicit tokens representing whitespace and comments.
  • Single-character tokens are used for all punctuation.
  • A lifetime (or label) is represented as a single token (which includes the leading ').