Unicode Lookup
Enter a codepoint like U+1F600 and get the emoji, encoding details, UTF-8/16 bytes, and HTML entities.
Converter
How to Use
-
1
Enter a codepoint or emoji
Type a Unicode codepoint in U+XXXX format (e.g., U+1F600), paste an emoji character directly, or enter a descriptive name to search. The tool accepts hex values with or without the U+ prefix and handles multi-codepoint sequences.
-
2
Read the full Unicode metadata
Review the official Unicode character name, block name, Unicode version when the character was introduced, Unicode category, and all relevant emoji properties (Emoji, Emoji_Presentation, Emoji_Modifier_Base, etc.) from Unicode's emoji-data.txt.
-
3
Copy any of the 8 encoding formats
Select and copy the encoding you need — UTF-8 bytes, UTF-16 code units, HTML decimal or hex entity, CSS escape, JavaScript string escape, Python escape, Java escape, or URL percent-encoding — each displayed in ready-to-use format.
About
At the most fundamental level, every emoji is a Unicode codepoint — a number in the Unicode codespace (U+0000 to U+10FFFF) assigned and maintained by the Unicode Consortium. The mapping from a codepoint to a displayed glyph passes through several layers: the encoding form (UTF-8, UTF-16, or UTF-32) used to store and transmit the character as bytes, the rendering engine that looks up the glyph in a font, and the font itself (which may be an OS system font or a custom emoji font). Understanding codepoints and encodings is foundational to building applications that handle emoji correctly.
Emoji codepoints are spread across more than a dozen Unicode blocks, reflecting the history of their incorporation into Unicode. Many early emoji were originally proprietary characters from Japanese mobile carriers (NTT DoCoMo, au, SoftBank) that were harmonized into Unicode 6.0 in 2010 using codepoints in previously unassigned ranges. The Supplementary Multilingual Plane (SMP, U+10000–U+10FFFF), where most modern emoji reside, requires surrogate pair encoding in UTF-16 — a detail that causes persistent bugs in JavaScript (which uses UTF-16 internally) and other languages with similar string models.
Beyond the codepoint itself, each emoji carries a set of Unicode character properties defined in emoji-data.txt, UnicodeData.txt, and related data files. These properties — Emoji, Emoji_Presentation, Emoji_Modifier_Base, Emoji_Component, Extended_Pictographic — are what text shaping engines use to determine how to process sequences, apply modifiers, and segment grapheme clusters. The Unicode Character Database (UCD) is the authoritative source for all of this metadata, available for download at unicode.org and updated with each annual Unicode release.