Surrogate Pair

Encoding & Standards

Two UTF-16 code units (a high surrogate U+D800-U+DBFF followed by a low surrogate U+DC00-U+DFFF) that together represent a character above U+FFFF.

Characters with code points above U+FFFF cannot fit in a single 16-bit value. UTF-16 solves this by encoding them as two 16-bit values called a surrogate pair. Since most emoji are above U+FFFF, surrogate pairs are extremely common with emoji.

For example, 😀 (U+1F600) becomes the surrogate pair 0xD83D 0xDE00 in UTF-16. The formula: high surrogate = 0xD800 + ((cp - 0x10000) >> 10), low surrogate = 0xDC00 + ((cp - 0x10000) & 0x3FF).

This is why JavaScript's `'😀'.charCodeAt(0)` returns 55357 (0xD83D) — it's returning the high surrogate, not the actual code point.

Related Terms

Code Unit Code Unit
The minimum bit combination used for encoding a character: 8-bit for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32.
Plane Plane
A group of 65,536 consecutive Unicode code points. Plane 0 is the Basic Multilingual Plane (BMP); most emoji live in Plane 1 (SMP).
Supplementary Multilingual Plane (SMP) Supplementary Multilingual Plane (SMP)
Unicode Plane 1 (U+10000 to U+1FFFF), where the majority of emoji code points are allocated.
UTF-16 UTF-16
A variable-width Unicode encoding that uses 2 or 4 bytes per character, used internally by JavaScript, Java, and Windows.

Related Tools

🔢 Unicode Lookup Unicode Lookup
Enter a codepoint like U+1F600 and get the emoji, encoding details, UTF-8/16 bytes, and HTML entities.
🔍 Sequence Analyzer Sequence Analyzer
Decode ZWJ sequences, skin tone modifiers, keycap sequences, and flag pairs into individual components.