🔧 Interactive Tools

Unicode Lookup: Find Any Emoji by Code Point

What Is the UnicodeUnicode
Universal character encoding standard that assigns a unique number to every character across all writing systems and symbol sets, including emoji.
Lookup Tool?

Every character you see on screen — including all 3,900+ emojiEmoji
A Japanese word (絵文字) meaning 'picture character' — small graphical symbols used in digital communication to express ideas, emotions, and objects.
— has a unique numeric identifier assigned by the Unicode ConsortiumUnicode Consortium
The non-profit organization that develops and maintains the Unicode Standard, including the process for adding new emoji.
. The Unicode Lookup tool is a quick reference utility that takes either a code point (like U+1F600) or a pasted emoji character (like 😀) and returns the complete encoding picture: UTF-8 bytes, UTF-16 code units, HTML entities, CSS content values, and language-specific escape sequences.

For developers working with emoji in strings, databases, APIs, or user interfaces, this kind of instant breakdown eliminates the guesswork of converting between representations. Instead of reaching for a Python shell or searching scattered documentation, you get the full encoding story in one place.

Understanding Code Points

The U+ Notation

A code point is the canonical identifier for a Unicode character. It is written in the form U+ followed by four to six hexadecimal digits. The U+ prefix is universal shorthand: it does not represent bytes in memory, only an abstract numeric position within the Unicode codespace.

For example: - U+1F600 — 😀 Grinning Face - U+2764 — ❤ Heavy Black Heart - U+1F1FA U+1F1F8 — 🇺🇸 Flag: United States (two code points)

When you copy a code pointCode Point
A unique numerical value assigned to each character in the Unicode standard, written in the format U+XXXX (e.g., U+1F600 for 😀).
from a specification, a font tool, or another reference site, you can paste it directly into the Unicode Lookup tool exactly as written.

Hexadecimal Code Points

Code points are expressed in hexadecimal (base 16) because the Unicode codespace runs from U+0000 to U+10FFFF — over 1.1 million possible positions. Decimal would be cumbersome for both humans and tools. Hex maps cleanly to the byte-level representations used by UTF-8 and UTF-16, making it the natural notation for encoding work.

If you are more comfortable with decimal, the tool displays the decimal equivalent alongside the hex value. U+1F600 in decimal is 128512.

Code Point Ranges for Emojis

Emoji code points are scattered across several Unicode blocks rather than grouped in a single contiguous range:

Range Content
U+2000 – U+27FF Symbols, punctuation, and some legacy emoji (✅, ⚡)
U+1F300 – U+1F9FF Core emoji block (faces, animals, food, activities)
U+1FA00 – U+1FAFF Extended-A (newer additions, chess pieces, household objects)
U+1F1E0 – U+1F1FF Regional IndicatorRegional Indicator (RI)
Paired Unicode letters (U+1F1E6 to U+1F1FF) that form country flag emoji when combined according to ISO 3166-1 alpha-2 codes.
Symbols (used in flag sequences)

Understanding where an emoji lives in the codespace matters when you encounter database errors about character encoding ranges, which often surface because MySQL's utf8 charset only covers U+0000 through U+FFFF and rejects emoji above that range.

How to Use the Tool

Enter a Code Point (e.g., U+1F600)

Type or paste a code point into the lookup field. The tool accepts several common input formats:

  • U+1F600 — canonical form with prefix
  • 1F600 — bare hex digits
  • 0x1F600 — C-style hex prefix

After submission, the tool resolves the code point to a character, displays the rendered glyph, and expands all encoding representations below.

Enter an Emoji Character

If you have an emoji character and want to reverse-engineer its code point, simply paste the character directly into the input field. The tool detects whether the input is a character or a code point notation and routes accordingly. Pasting 🥑 will resolve to U+1F951 and show all encoding formats automatically.

This reverse lookup is particularly useful when you receive an emoji in a log file, a database export, or an API response and need to identify exactly what character it is.

View Full Encoding Breakdown

Once the lookup resolves, the tool displays a structured breakdown of every major encoding format. Each value is individually copyable so you can paste the exact representation you need into your code without manual conversion.

Encoding Formats Explained

UTF-8UTF-8
A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of websites).
Bytes

UTF-8 is the dominant encoding on the web and in most modern applications. It encodes each code point as one to four bytes, with higher code points requiring more bytes.

The grinning face emoji 😀 (U+1F600) encodes to four bytes: F0 9F 98 80. You will see this representation when inspecting raw HTTP responses, binary file contents, or low-level network streams. If your application is stripping or mangling emoji, the byte sequence is the place to start diagnosing.

UTF-16UTF-16
A variable-width Unicode encoding that uses 2 or 4 bytes per character, used internally by JavaScript, Java, and Windows.
Surrogate Pairs

UTF-16 uses 16-bit code units. Characters in the Basic Multilingual PlanePlane
A group of 65,536 consecutive Unicode code points. Plane 0 is the Basic Multilingual Plane (BMP); most emoji live in Plane 1 (SMP).
(U+0000 to U+FFFF) fit in a single 16-bit unit. Characters above U+FFFF — which includes most emoji — require two 16-bit units called a surrogate pair.

For U+1F600, the surrogate pairSurrogate Pair
Two UTF-16 code units (a high surrogate U+D800-U+DBFF followed by a low surrogate U+DC00-U+DFFF) that together represent a character above U+FFFF.
is D83D DE00. The first unit (D83D) is the high surrogate and the second (DE00) is the low surrogate. JavaScript strings are internally UTF-16, which is why '😀'.length evaluates to 2 rather than 1: each surrogate counts as a separate code unitCode Unit
The minimum bit combination used for encoding a character: 8-bit for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32.
. This is a common source of off-by-one errors in string manipulation code that was written without emoji in mind.

The Unicode Lookup tool shows the surrogate pair for any emoji in the supplementary planes so you can anticipate and handle these length discrepancies.

HTML Entities (Decimal and Hex)

HTML supports numeric character references in two forms:

  • Decimal: 😀
  • Hexadecimal: 😀

Both render identically in the browser. The hex form is more readable for anyone cross-referencing Unicode documentation, while the decimal form occasionally appears in older codebases and XML documents. The tool provides both so you can match whatever convention your project already uses.

CSS content Property

When rendering emoji through CSS (for decorative icons, pseudo-elements, or font-icon replacements), the content property uses a backslash-prefixed hex escape without the U+ prefix:

.emoji-icon::before {
  content: "\1F600";
}

This format strips the U+ and replaces it with \. The Unicode Lookup tool displays the ready-to-paste CSS value alongside the other representations.

Python and JavaScript Literals

Each language has its own string escape format for Unicode characters:

Python uses \U followed by eight hex digits for supplementary plane characters:

grinning_face = "\U0001F600"
# or use chr() with the decimal value:
grinning_face = chr(0x1F600)

JavaScript uses \u{...} with ES2015+ template literals or strings:

const grinningFace = "\u{1F600}";
// older environments using surrogate pairs explicitly:
const grinningFace = "\uD83D\uDE00";

The tool outputs both the modern \u{...} form and the surrogate pair escape so you can target any JavaScript environment.

Practical Developer Use Cases

Debugging Emoji in Code

When an emoji appears as a replacement character (�), a question mark, or a sequence of garbage bytes, you need to identify the original code point and then trace which encoding layer is failing. Copy the broken output, paste it into the Unicode Lookup tool, and compare its byte sequence against what your application is producing. Mismatches in byte count or sequence often point to an encoding mismatch between the application layer and the database, or between the database and the client.

Cross-Platform Encoding Issues

Different platforms store and transmit Unicode differently. A string that round-trips cleanly between two Python services may break when passed through a Java layer that uses UTF-16 internally, or through a MySQL column using the utf8 charset instead of utf8mb4. The encoding breakdown from the Unicode Lookup tool gives you the concrete byte values and code unit sequences to verify that each layer is handling the character correctly.

Database Storage Considerations

MySQL's legacy utf8 charset is limited to three bytes per character, which excludes all emoji above U+FFFF. The correct charset for emoji storage is utf8mb4, which supports the full four-byte UTF-8 range. PostgreSQL uses UTF8 natively and handles all Unicode code points without special configuration.

If you see a "Incorrect string value" error in MySQL when inserting emoji, the Unicode Lookup tool lets you confirm whether the character is in the four-byte range (above U+FFFF) and verify that your column charset is set to utf8mb4.

Multi-Code-Point Emojis

Many emoji that appear as a single character are actually sequences of multiple code points rendered as a single visual unit — a grapheme cluster. Common sequence types include:

  • ZWJ sequences: A Zero Width JoinerZero Width Joiner (ZWJ)
    An invisible Unicode character (U+200D) used to join multiple emoji into a single composite emoji, such as combining people and objects into profession emoji.
    (U+200D) connects two or more emoji into a combined form. For example, 👩‍💻 (woman technologist) is U+1F469 U+200D U+1F4BB.
  • Skin tone modifiers: A Fitzpatrick modifier (U+1F3FB through U+1F3FF) is appended to a base emoji: 👍🏽 is U+1F44D U+1F3FD.
  • Flag sequences: Pairs of Regional Indicator Symbols combine into country flags: 🇯🇵 is U+1F1EF U+1F1F5.

The Unicode Lookup tool handles single code points. For sequences composed of multiple code points, use the Sequence Analyzer, which breaks a complex emoji sequenceEmoji Sequence
An ordered set of one or more Unicode code points that together represent a single emoji character.
into its constituent components and explains the role of each part.

Understanding grapheme clusters is especially important when implementing string length checks or character counting in user interfaces. A skin-toned emoji or a family emoji built from ZWJ connections may consist of three, five, or more code points while visually occupying the space of one character.

  • Unicode Lookup — Enter a code point or paste an emoji for a full encoding breakdown
  • Sequence Analyzer — Decompose multi-code-point emoji sequences (ZWJ, skin tones, flags)

Glossary reference: - Code point — The numeric identity of a Unicode character - UTF-8 — Variable-width encoding used on the web - UTF-16 — Fixed-width encoding used internally by JavaScript and Java - Surrogate pair — Two UTF-16 code units representing a supplementary plane character - Grapheme cluster — A sequence of code points that renders as a single visible character

Related Tools

🔍 Sequence Analyzer Sequence Analyzer
Decode ZWJ sequences, skin tone modifiers, keycap sequences, and flag pairs into individual components.
🔢 Unicode Lookup Unicode Lookup
Enter a codepoint like U+1F600 and get the emoji, encoding details, UTF-8/16 bytes, and HTML entities.

Glossary Terms

Code Point Code Point
A unique numerical value assigned to each character in the Unicode standard, written in the format U+XXXX (e.g., U+1F600 for 😀).
Code Unit Code Unit
The minimum bit combination used for encoding a character: 8-bit for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32.
Emoji Emoji
A Japanese word (絵文字) meaning 'picture character' — small graphical symbols used in digital communication to express ideas, emotions, and objects.
Emoji Sequence Emoji Sequence
An ordered set of one or more Unicode code points that together represent a single emoji character.
Grapheme Cluster Grapheme Cluster
A user-perceived character that may be composed of multiple Unicode code points displayed as a single visual unit.
Plane Plane
A group of 65,536 consecutive Unicode code points. Plane 0 is the Basic Multilingual Plane (BMP); most emoji live in Plane 1 (SMP).
Regional Indicator (RI) Regional Indicator (RI)
Paired Unicode letters (U+1F1E6 to U+1F1FF) that form country flag emoji when combined according to ISO 3166-1 alpha-2 codes.
Surrogate Pair Surrogate Pair
Two UTF-16 code units (a high surrogate U+D800-U+DBFF followed by a low surrogate U+DC00-U+DFFF) that together represent a character above U+FFFF.
UTF-16 UTF-16
A variable-width Unicode encoding that uses 2 or 4 bytes per character, used internally by JavaScript, Java, and Windows.
UTF-8 UTF-8
A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of websites).
Unicode Unicode
Universal character encoding standard that assigns a unique number to every character across all writing systems and symbol sets, including emoji.
Unicode Consortium Unicode Consortium
The non-profit organization that develops and maintains the Unicode Standard, including the process for adding new emoji.
Zero Width Joiner (ZWJ) Zero Width Joiner (ZWJ)
An invisible Unicode character (U+200D) used to join multiple emoji into a single composite emoji, such as combining people and objects into profession emoji.

Related Stories