Code Unit
Encoding & StandardsThe minimum bit combination used for encoding a character: 8-bit for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32.
A code unit is the fundamental building block of a Unicode encoding form. It's important to distinguish code units from code points — a single code point may require multiple code units depending on the encoding.In UTF-8, a code unit is 8 bits (1 byte). The emoji 😀 requires 4 code units. In UTF-16, a code unit is 16 bits (2 bytes). The same emoji requires 2 code units (a surrogate pair). In UTF-32, it's 1 code unit (4 bytes).
Many programming language string APIs operate on code units rather than code points, which is why string length calculations can be confusing with emoji.