Unicode Lookup: Find Any Emoji by Code Point

What Is the UnicodeUnicode
Standard universel d'encodage des caractères qui attribue un numéro unique à chaque caractère de tous les systèmes d'écriture et ensembles de symboles, y compris les emoji.
Lookup Tool?

Every character you see on screen — including all 3,900+ emojiEmoji
Mot japonais (絵文字) signifiant 'caractère image' — petits symboles graphiques utilisés dans la communication numérique pour exprimer des idées, des émotions et des objets.
— has a unique numeric identifier assigned by the Unicode Consortium. The Unicode Lookup tool is a quick reference utility that takes either a code point (like U+1F600) or a pasted emoji character (like 😀) and returns the complete encoding picture: UTF-8 bytes, UTF-16 code units, HTML entities, CSS content values, and language-specific escape sequences.

For developers working with emoji in strings, databases, APIs, or user interfaces, this kind of instant breakdown eliminates the guesswork of converting between representations. Instead of reaching for a Python shell or searching scattered documentation, you get the full encoding story in one place.

Understanding Code Points

The U+ Notation

A code point is the canonical identifier for a Unicode character. It is written in the form U+ followed by four to six hexadecimal digits. The U+ prefix is universal shorthand: it does not represent bytes in memory, only an abstract numeric position within the Unicode codespace.

For example: - U+1F600 — 😀 Grinning Face - U+2764 — ❤ Heavy Black Heart - U+1F1FA U+1F1F8 — 🇺🇸 Flag: United States (two code points)

When you copy a code point from a specification, a font tool, or another reference site, you can paste it directly into the Unicode Lookup tool exactly as written.

Hexadecimal Code Points

Code points are expressed in hexadecimal (base 16) because the Unicode codespace runs from U+0000 to U+10FFFF — over 1.1 million possible positions. Decimal would be cumbersome for both humans and tools. Hex maps cleanly to the byte-level representations used by UTF-8 and UTF-16, making it the natural notation for encoding work.

If you are more comfortable with decimal, the tool displays the decimal equivalent alongside the hex value. U+1F600 in decimal is 128512.

Code Point Ranges for Emojis

Emoji code points are scattered across several Unicode blocks rather than grouped in a single contiguous range:

Range Content
U+2000 – U+27FF Symbols, punctuation, and some legacy emoji (✅, ⚡)
U+1F300 – U+1F9FF Core emoji block (faces, animals, food, activities)
U+1FA00 – U+1FAFF Extended-A (newer additions, chess pieces, household objects)
U+1F1E0 – U+1F1FF Regional Indicator Symbols (used in flag sequences)

Understanding where an emoji lives in the codespace matters when you encounter database errors about character encoding ranges, which often surface because MySQL's utf8 charset only covers U+0000 through U+FFFF and rejects emoji above that range.

How to Use the Tool

Enter a Code Point (e.g., U+1F600)

Type or paste a code point into the lookup field. The tool accepts several common input formats:

  • U+1F600 — canonical form with prefix
  • 1F600 — bare hex digits
  • 0x1F600 — C-style hex prefix

After submission, the tool resolves the code point to a character, displays the rendered glyph, and expands all encoding representations below.

Enter an Emoji Character

If you have an emoji character and want to reverse-engineer its code point, simply paste the character directly into the input field. The tool detects whether the input is a character or a code point notation and routes accordingly. Pasting 🥑 will resolve to U+1F951 and show all encoding formats automatically.

This reverse lookup is particularly useful when you receive an emoji in a log file, a database export, or an API response and need to identify exactly what character it is.

View Full Encoding Breakdown

Once the lookup resolves, the tool displays a structured breakdown of every major encoding format. Each value is individually copyable so you can paste the exact representation you need into your code without manual conversion.

Encoding Formats Explained

UTF-8UTF-8
Encodage Unicode à largeur variable utilisant de 1 à 4 octets par caractère, dominant sur le web (utilisé par plus de 98 % des sites web).
Bytes

UTF-8 is the dominant encoding on the web and in most modern applications. It encodes each code point as one to four bytes, with higher code points requiring more bytes.

The grinning face emoji 😀 (U+1F600) encodes to four bytes: F0 9F 98 80. You will see this representation when inspecting raw HTTP responses, binary file contents, or low-level network streams. If your application is stripping or mangling emoji, the byte sequence is the place to start diagnosing.

UTF-16UTF-16
Encodage Unicode à largeur variable utilisant 2 ou 4 octets par caractère, employé en interne par JavaScript, Java et Windows.
Surrogate Pairs

UTF-16 uses 16-bit code units. Characters in the Basic Multilingual Plane (U+0000 to U+FFFF) fit in a single 16-bit unit. Characters above U+FFFF — which includes most emoji — require two 16-bit units called a surrogate pair.

For U+1F600, the surrogate pair is D83D DE00. The first unit (D83D) is the high surrogate and the second (DE00) is the low surrogate. JavaScript strings are internally UTF-16, which is why '😀'.length evaluates to 2 rather than 1: each surrogate counts as a separate code unit. This is a common source of off-by-one errors in string manipulation code that was written without emoji in mind.

The Unicode Lookup tool shows the surrogate pair for any emoji in the supplementary planes so you can anticipate and handle these length discrepancies.

HTML Entities (Decimal and Hex)

HTML supports numeric character references in two forms:

  • Decimal: 😀
  • Hexadecimal: 😀

Both render identically in the browser. The hex form is more readable for anyone cross-referencing Unicode documentation, while the decimal form occasionally appears in older codebases and XML documents. The tool provides both so you can match whatever convention your project already uses.

CSS content Property

When rendering emoji through CSS (for decorative icons, pseudo-elements, or font-icon replacements), the content property uses a backslash-prefixed hex escape without the U+ prefix:

.emoji-icon::before {
  content: "\1F600";
}

This format strips the U+ and replaces it with \. The Unicode Lookup tool displays the ready-to-paste CSS value alongside the other representations.

Python and JavaScript Literals

Each language has its own string escape format for Unicode characters:

Python uses \U followed by eight hex digits for supplementary plane characters:

grinning_face = "\U0001F600"
# or use chr() with the decimal value:
grinning_face = chr(0x1F600)

JavaScript uses \u{...} with ES2015+ template literals or strings:

const grinningFace = "\u{1F600}";
// older environments using surrogate pairs explicitly:
const grinningFace = "\uD83D\uDE00";

The tool outputs both the modern \u{...} form and the surrogate pair escape so you can target any JavaScript environment.

Practical Developer Use Cases

Debugging Emoji in Code

When an emoji appears as a replacement character (�), a question mark, or a sequence of garbage bytes, you need to identify the original code point and then trace which encoding layer is failing. Copy the broken output, paste it into the Unicode Lookup tool, and compare its byte sequence against what your application is producing. Mismatches in byte count or sequence often point to an encoding mismatch between the application layer and the database, or between the database and the client.

Cross-Platform Encoding Issues

Different platforms store and transmit Unicode differently. A string that round-trips cleanly between two Python services may break when passed through a Java layer that uses UTF-16 internally, or through a MySQL column using the utf8 charset instead of utf8mb4. The encoding breakdown from the Unicode Lookup tool gives you the concrete byte values and code unit sequences to verify that each layer is handling the character correctly.

Database Storage Considerations

MySQL's legacy utf8 charset is limited to three bytes per character, which excludes all emoji above U+FFFF. The correct charset for emoji storage is utf8mb4, which supports the full four-byte UTF-8 range. PostgreSQL uses UTF8 natively and handles all Unicode code points without special configuration.

If you see a "Incorrect string value" error in MySQL when inserting emoji, the Unicode Lookup tool lets you confirm whether the character is in the four-byte range (above U+FFFF) and verify that your column charset is set to utf8mb4.

Multi-Code-Point Emojis

Many emoji that appear as a single character are actually sequences of multiple code points rendered as a single visual unit — a grapheme cluster. Common sequence types include:

  • ZWJJointure sans chasse (ZWJ)
    Caractère Unicode invisible (U+200D) utilisé pour combiner plusieurs emoji en un seul emoji composite, comme l'assemblage de personnes et d'objets pour former des emoji de professions.
    sequences
    : A Zero Width Joiner (U+200D) connects two or more emoji into a combined form. For example, 👩‍💻 (woman technologist) is U+1F469 U+200D U+1F4BB.
  • Skin tone modifiers: A Fitzpatrick modifier (U+1F3FB through U+1F3FF) is appended to a base emoji: 👍🏽 is U+1F44D U+1F3FD.
  • Flag sequences: Pairs of Regional Indicator Symbols combine into country flags: 🇯🇵 is U+1F1EF U+1F1F5.

The Unicode Lookup tool handles single code points. For sequences composed of multiple code points, use the Sequence Analyzer, which breaks a complex emoji sequence into its constituent components and explains the role of each part.

Understanding grapheme clusters is especially important when implementing string length checks or character counting in user interfaces. A skin-toned emoji or a family emoji built from ZWJ connections may consist of three, five, or more code points while visually occupying the space of one character.

  • Unicode Lookup — Enter a code point or paste an emoji for a full encoding breakdown
  • Sequence Analyzer — Decompose multi-code-point emoji sequences (ZWJ, skin tones, flags)

Glossary reference: - Code point — The numeric identity of a Unicode character - UTF-8 — Variable-width encoding used on the web - UTF-16 — Fixed-width encoding used internally by JavaScript and Java - Surrogate pair — Two UTF-16 code units representing a supplementary plane character - Grapheme cluster — A sequence of code points that renders as a single visible character

Outils associés

🔍 Analyseur de séquences Analyseur de séquences
Décodez les séquences ZWJ, les modificateurs de teinte de peau, les séquences de touches et les paires de drapeaux en composants individuels.
🔢 Recherche Unicode Recherche Unicode
Saisissez un point de code comme U+1F600 et obtenez l'emoji, les détails d'encodage, les octets UTF-8/16 et les entités HTML.

Termes du glossaire

Cluster de graphèmes Cluster de graphèmes
Caractère perçu par l'utilisateur pouvant être composé de plusieurs points de code Unicode affichés comme une seule unité visuelle.
Consortium Unicode Consortium Unicode
L'organisation à but non lucratif qui développe et maintient la norme Unicode, y compris le processus d'ajout de nouveaux emoji.
Emoji Emoji
Mot japonais (絵文字) signifiant 'caractère image' — petits symboles graphiques utilisés dans la communication numérique pour exprimer des idées, des émotions et des objets.
Indicateur régional (RI) Indicateur régional (RI)
Lettres Unicode associées par paires (U+1F1E6 à U+1F1FF) qui forment des emoji de drapeaux nationaux selon les codes ISO 3166-1 alpha-2.
Jointure sans chasse (ZWJ) Jointure sans chasse (ZWJ)
Caractère Unicode invisible (U+200D) utilisé pour combiner plusieurs emoji en un seul emoji composite, comme l'assemblage de personnes et d'objets pour former des emoji de professions.
Paire de substituts Paire de substituts
Deux unités de code UTF-16 (un substitut haut U+D800–U+DBFF suivi d'un substitut bas U+DC00–U+DFFF) qui représentent ensemble un caractère au-delà de U+FFFF.
Plan Unicode Plan Unicode
Groupe de 65 536 points de code Unicode consécutifs. Le plan 0 est le plan multilingue de base (BMP) ; la plupart des emoji se trouvent dans le plan 1 …
Point de code Point de code
Valeur numérique unique attribuée à chaque caractère dans la norme Unicode, écrite au format U+XXXX (par exemple, U+1F600 pour 😀).
Séquence emoji Séquence emoji
Ensemble ordonné d'un ou plusieurs points de code Unicode qui représentent ensemble un seul caractère emoji.
Unicode Unicode
Standard universel d'encodage des caractères qui attribue un numéro unique à chaque caractère de tous les systèmes d'écriture et ensembles de symboles, y compris les emoji.
Unité de code Unité de code
La combinaison minimale de bits utilisée pour encoder un caractère : 8 bits pour UTF-8, 16 bits pour UTF-16 et 32 bits pour UTF-32.
UTF-16 UTF-16
Encodage Unicode à largeur variable utilisant 2 ou 4 octets par caractère, employé en interne par JavaScript, Java et Windows.
UTF-8 UTF-8
Encodage Unicode à largeur variable utilisant de 1 à 4 octets par caractère, dominant sur le web (utilisé par plus de 98 % des sites web).

Articles associés