Overview
The following processes might be considered to be part of Rust's lexer:
- Decode: interpret UTF-8 input as a sequence of Unicode characters
 - Clean:
- Byte order mark removal
 - CRLF normalisation
 - Shebang removal
 
 - Tokenise: interpret the characters as ("fine-grained") tokens
 - Further processing: to fit the needs of later parts of the spec
- For example, convert fine-grained tokens to compound tokens
 - possibly different for the grammar and the two macro implementations
 
 
This document attempts to completely describe the "Tokenise" process.