UTF-16

Encoding & Standards

A variable-width Unicode encoding that uses 2 or 4 bytes per character, used internally by JavaScript, Java, and Windows.

UTF-16 uses 16-bit code units. Characters in the Basic Multilingual Plane (U+0000 to U+FFFF) use one code unit (2 bytes). Characters above U+FFFF โ€” including most emoji โ€” require a surrogate pair (4 bytes).

This is why JavaScript's `string.length` can be surprising with emoji: `'๐Ÿ˜€'.length` returns 2 (two UTF-16 code units), not 1. Developers must use spread syntax (`[...'๐Ÿ˜€'].length`) or `Array.from()` for correct counting.

UTF-16 exists in two byte orders: UTF-16LE (little-endian, used by Windows) and UTF-16BE (big-endian). A BOM character can indicate which is used.

Related Terms

BOM (BOM) BOM (BOM)
The Byte Order Mark (U+FEFF) placed at the start of a text file to indicate byte order (endianness) in UTF-16/UTF-32 encodings.
Code Unit Code Unit
The minimum bit combination used for encoding a character: 8-bit for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32.
Supplementary Multilingual Plane (SMP) Supplementary Multilingual Plane (SMP)
Unicode Plane 1 (U+10000 to U+1FFFF), where the majority of emoji code points are allocated.
Surrogate Pair Surrogate Pair
Two UTF-16 code units (a high surrogate U+D800-U+DBFF followed by a low surrogate U+DC00-U+DFFF) that together represent a character above U+FFFF.
UTF-32 UTF-32
A fixed-width Unicode encoding that uses exactly 4 bytes per character, providing direct code point mapping at the cost of space.
UTF-8 UTF-8
A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of websites).

Related Tools

๐Ÿ”ข Unicode Lookup Unicode Lookup
Enter a codepoint like U+1F600 and get the emoji, encoding details, UTF-8/16 bytes, and HTML entities.