Definitions
Byte
For the purposes of this document, byte means the same thing as Rust's u8
(corresponding to a natural number in the range 0 to 255 inclusive).
Character
For the purposes of this document, character means the same thing as Rust's char
.
That means, in particular:
- there's exactly one character for each Unicode scalar value
- the things that Unicode calls "noncharacters" are characters
- there are no characters corresponding to surrogate code points
Sequence
When this document refers to a sequence of items, it means a finite, but possibly empty, ordered list of those items.
"character sequence" and "sequence of characters" are different ways of saying the same thing.
Prefix of a sequence
When this document talks about a prefix of a sequence, it means "prefix" in the way that abc
is a prefix of abcde
.
The prefix may be empty, or the entire sequence.
NFC normalisation
References to NFC-normalised strings are talking about Unicode's Normalization Form C, defined in Unicode Standard Annex #15.