Encoding & Standards
BOM (BOM)
The Byte Order Mark (U+FEFF) placed at the start of a text file to indicate byte order (endianness) in UTF-16/UTF-32 …
Code Unit
The minimum bit combination used for encoding a character: 8-bit for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32.
Plane
A group of 65,536 consecutive Unicode code points. Plane 0 is the Basic Multilingual Plane (BMP); most emoji live in …
Supplementary Multilingual Plane (SMP)
Unicode Plane 1 (U+10000 to U+1FFFF), where the majority of emoji code points are allocated.
Surrogate Pair
Two UTF-16 code units (a high surrogate U+D800-U+DBFF followed by a low surrogate U+DC00-U+DFFF) that together represent a character above …
UTF-16
A variable-width Unicode encoding that uses 2 or 4 bytes per character, used internally by JavaScript, Java, and Windows.
UTF-32
A fixed-width Unicode encoding that uses exactly 4 bytes per character, providing direct code point mapping at the cost of …
UTF-8
A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of …