Rustc oddities

NFC normalisation for lifetime/label

Identifiers are normalised to NFC, which means that Kelvin and Kelvin are treated as representing the same identifier. See rfc2457.

But this doesn't happen for lifetimes or labels, so 'Kelvin and 'Kelvin are different as lifetimes or labels.

For example, this compiles without warning in Rust 1.86, while this doesn't.

In this writeup, the represented identifier attribute of Identifier and RawIdentifier fine-grained tokens is in NFC, and the name attribute of LifetimeOrLabel and RawLifetimeOrLabel tokens isn't.

I think this behaviour is a promising candidate for provoking the "Wait...that's what we currently do? We should fix that." reaction to being given a spec to review.

Filed as rustc #126759.

Nested block comments

The Reference says "Nested block comments are supported".

Rustc implements this by counting occurrences of /* and */, matching greedily. That means it rejects forms like /* xyz /*/.

This writeup includes a !"/*" subexpression in the BLOCK_COMMENT_CONTENT definition to match rustc's behaviour.

The grammar production in the Reference seems to be written to assume that these forms should be accepted (but I think it's garbled anyway: it accepts /* /* */).

I haven't seen any discussion of whether this rustc behaviour is considered desirable.

Restriction on e-suffixes

With the implementation of pr131656 as of 2025-04-27, support for numeric literal suffixes beginning with e or E is incomplete, and rejects some (very obscure) cases.

A numeric literal token is rejected if:

  • it doesn't have an exponent; and
  • it has a suffix of the following form:
    • begins with e or E
    • immediately followed by one or more _ characters
    • immediately followed by a character which has the XID_Continue property but not XID_Start.

For example, 123e_· is rejected.

The Reserved_float_e_suffix_restriction and Reserved_integer_e_suffix_restriction nonterminals describe this restriction in the grammar.