Definitions

Byte

For the purposes of this document, byte means the same thing as Rust's u8 (corresponding to a natural number in the range 0 to 255 inclusive).

Character

For the purposes of this document, character means the same thing as Rust's char. That means, in particular:

  • there's exactly one character for each Unicode scalar value
  • the things that Unicode calls "noncharacters" are characters
  • there are no characters corresponding to surrogate code points

Sequence

When this document refers to a sequence of items, it means a finite, but possibly empty, ordered list of those items.

"character sequence" and "sequence of characters" are different ways of saying the same thing.

Prefix of a sequence

When this document talks about a prefix of a sequence, it means "prefix" in the way that abc is a prefix of abcde. The prefix may be empty, or the entire sequence.

NFC normalisation

References to NFC-normalised strings are talking about Unicode's Normalization Form C, defined in Unicode Standard Annex #15.