Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Overview

The following processes might be considered to be part of Rust's lexer:

  • Decode: interpret UTF-8 input as a sequence of Unicode characters
  • Clean:
    • Byte order mark removal
    • CRLF normalisation
    • Shebang removal
  • Tokenise: interpret the characters as ("fine-grained") tokens
  • Lower doc-comments: convert doc-comments into attributes
  • Build trees: organise tokens into delimited groups
  • Combine: convert fine-grained tokens to compound tokens (for declarative macros)
  • Prepare proc-macro input: convert fine-grained tokens to the form used for proc-macros
  • Remove whitespace: remove whitespace tokens

This document attempts to completely describe the "Decode", "Clean", "Tokenise", and "Lower doc-comments" processes.