Paire de substituts

Encoding & Standards

Deux unités de code UTF-16 (un substitut haut U+D800–U+DBFF suivi d'un substitut bas U+DC00–U+DFFF) qui représentent ensemble un caractère au-delà de U+FFFF.

Characters with code points above U+FFFF cannot fit in a single 16-bit value. UTF-16 solves this by encoding them as two 16-bit values called a surrogate pair. Since most emoji are above U+FFFF, surrogate pairs are extremely common with emoji.

For example, 😀 (U+1F600) becomes the surrogate pair 0xD83D 0xDE00 in UTF-16. The formula: high surrogate = 0xD800 + ((cp - 0x10000) >> 10), low surrogate = 0xDC00 + ((cp - 0x10000) & 0x3FF).

This is why JavaScript's `'😀'.charCodeAt(0)` returns 55357 (0xD83D) — it's returning the high surrogate, not the actual code point.

Termes associés

Outils associés

Articles associés

Unicode Lookup: Find Any Emoji by Code Point

Enter a Unicode code point like U+1F600 or paste an emoji to see its full encoding breakdown — UTF-8, UTF-16, HTML entities, and more.

How to Handle Emojis in JavaScript: Strings, Length, and Rendering

Learn how to work with emojis in JavaScript: string length pitfalls, Unicode code points, regex, grapheme segmentation, and React rendering tips.

Why string.length Lies: Grapheme Clusters and Emoji Length

Why string.length fails for emoji, what grapheme clusters are, and how to correctly count characters in JavaScript, Python, and more.

Handling Emojis in JSON and REST APIs

How to correctly handle emoji in JSON serialization, REST API design, database storage, and HTTP headers — including surrogate pair pitfalls.

Emoji Encoding Guide: UTF-8, UTF-16 & Surrogate Pairs

A developer's guide to emoji encoding — how UTF-8, UTF-16, and UTF-32 represent emoji, and common pitfalls with surrogate pairs.