Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Definitions

Table of contents

Unicode

References to Unicode in this document refer to the Unicode standard, version 16.0.

References to the Unicode character database refer to version 16.0.0.

NFC normalisation

References to NFC-normalised strings are talking about Unicode's Normalization Form C, defined in Unicode Standard Annex #15.

Byte

For the purposes of this document, byte means the same thing as Rust's u8 (corresponding to a natural number in the range 0 to 255 inclusive).

Character

For the purposes of this document, character means the same thing as Rust's char. That means, in particular:

  • there's exactly one character for each Unicode scalar value
  • the things that Unicode calls "noncharacters" are characters
  • there are no characters corresponding to surrogate code points
  • there is a character for each unassigned code point

Notation for characters

This document identifies characters in the following ways:

Printable ASCII characters other than space are represented by themselves using highlighting like a. For example \ represents character U+005C (REVERSE SOLIDUS).

ASCII control characters and space are represented as follows:

U+0000NUL
U+000ALF
U+000DCR
U+0009HT
U+0020SP

Other characters are identified by hexadecimal scalar value and name, for example U+FEFF (BYTE ORDER MARK).

Sequence

When this document refers to a sequence of items, it means a finite, but possibly empty, ordered list of those items.

"character sequence" and "sequence of characters" are different ways of saying the same thing.