UnicodeUnicode
Universal character encoding standard that assigns a unique number to every character across all writing systems and symbol sets, including emoji. EmojiEmoji
A Japanese word (็ตตๆๅญ) meaning 'picture character' โ small graphical symbols used in digital communication to express ideas, emotions, and objects. Properties
Unicode assigns named properties to every code pointCode Point
A unique numerical value assigned to each character in the Unicode standard, written in the format U+XXXX (e.g., U+1F600 for ๐)., and emoji have a dedicated set defined in Unicode Standard Annex #51. These properties are the authoritative source for deciding whether a character is an emoji, how it should be displayed, and how it interacts with other characters.
Understanding these properties is essential for building correct emoji parsers, validators, and renderers.
The Six Core Emoji Properties
1. Emoji
The broadest property. A code point has Emoji=Yes if it can be used as an emoji. This includes characters that are also used as ordinary text symbols.
U+0023 # NUMBER SIGN Emoji=Yes (part of # keycap)
U+00A9 ยฉ COPYRIGHT SIGN Emoji=Yes
U+1F600 ๐ GRINNING FACE Emoji=Yes
About 1,400+ code points have Emoji=Yes. The Emoji property alone is too broad for most detection tasks because it includes digits, common punctuation, and symbols that usually appear as plain text.
2. Emoji_Presentation
A code point has Emoji_Presentation=Yes if it is displayed as a color emojiColor Emoji
Full-color emoji rendered using bitmap images or color vector graphics, as opposed to monochrome text-style rendering. by default โ without requiring a variation selectorVariation Selector (VS)
Unicode characters (VS-15 U+FE0E and VS-16 U+FE0F) that modify whether a character renders in text (monochrome) or emoji (colorful) presentation.. This is the property most commonly used to answer "is this character normally shown as an emoji?"
U+1F600 ๐ Emoji_Presentation=Yes (emoji by default)
U+0023 # Emoji_Presentation=No (text by default, needs U+FE0F for emoji)
U+00A9 ยฉ Emoji_Presentation=No (text by default)
Approximately 1,200 code points have Emoji_Presentation=Yes.
3. Emoji_Modifier
Marks the five skin tone modifierSkin Tone Modifier
Five Unicode modifier characters based on the Fitzpatrick scale that change the skin color of human emoji (U+1F3FB to U+1F3FF). characters (Fitzpatrick scale):
| Code Point | Character | Tone |
|---|---|---|
| U+1F3FB | ๐ป | Light |
| U+1F3FC | ๐ผ | Medium-Light |
| U+1F3FD | ๐ฝ | Medium |
| U+1F3FE | ๐พ | Medium-Dark |
| U+1F3FF | ๐ฟ | Dark |
These characters have no standalone appearance; they only make sense immediately following an Emoji_Modifier_Base character.
4. Emoji_Modifier_Base
A code point has Emoji_Modifier_Base=Yes if it can be followed by a skin tone modifier to create a modified sequence. Examples include ๐ (waving hand), ๐ (hand with fingers splayed), and ๐ง (person).
# Checking modifier base + modifier sequence
base = "\U0001F44B" # ๐
modifier = "\U0001F3FD" # ๐ฝ Medium skin tone
combined = base + modifier
print(combined) # ๐๐ฝ
Not all person-like emoji are modifier bases. ๐ป (ghost) and ๐ค (robot) are not modifier bases because they are not human-like enough.
5. Emoji_Component
A code point that can appear as part of an emoji sequenceEmoji Sequence
An ordered set of one or more Unicode code points that together represent a single emoji character. but is not itself an emoji when standalone. This includes:
- Skin tone modifiers (U+1F3FBโU+1F3FF)
- ZWJZero Width Joiner (ZWJ)
An invisible Unicode character (U+200D) used to join multiple emoji into a single composite emoji, such as combining people and objects into profession emoji. (U+200D) - Variation selectors (U+FE0E, U+FE0F)
- Combining Enclosing Keycap (U+20E3)
- Tag characters (U+E0020โU+E007F) used in subdivision flags
- Regional IndicatorRegional Indicator (RI)
Paired Unicode letters (U+1F1E6 to U+1F1FF) that form country flag emoji when combined according to ISO 3166-1 alpha-2 codes. letters (U+1F1E0โU+1F1FF)
Emoji_Component=Yes does not imply the character is itself an emoji โ ZWJ is a joining control character, not an emoji.
6. Extended_Pictographic
The most useful property for comprehensive emoji detection. It covers:
- All code points with
Emoji_Presentation=Yes - Reserved code points in emoji blocks (for future emoji)
- Additional pictographic symbols
Extended_Pictographic โ Emoji_Presentation
Using Extended_Pictographic in a regex ensures your code will not break when new emoji are added to existing Unicode blocks, because those blocks are already reserved.
Accessing Property Data
From the Unicode Character Database
The official source is emoji-data.txt in the Unicode UCD:
# Download the latest data file
curl -O https://unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt
# View Extended_Pictographic ranges
grep "Extended_Pictographic" emoji-data.txt | head -20
Sample output:
00A9 ; Emoji # 1.1 [1] (ยฉ๏ธ)
00AE ; Emoji # 1.1 [1] (ยฎ๏ธ)
203C ; Emoji # 1.1 [1] (โผ๏ธ)
...
1F600 ; Emoji # 6.1 [1] (๐)
1F600 ; Emoji_Presentation # 6.1 [1] (๐)
...
1F3FB..1F3FF ; Emoji_Modifier # 8.0 [5] (๐ป..๐ฟ)
Using Python's unicodedata Module
The standard library only exposes a subset of properties. For emoji properties, use the unicodedata2 or regex module:
import regex
def get_emoji_properties(char: str) -> dict:
"""Return relevant Unicode emoji properties for a character."""
cp = ord(char)
return {
"code_point": f"U+{cp:04X}",
"character": char,
"is_emoji": bool(regex.match(r'\p{Emoji}', char)),
"is_emoji_presentation": bool(regex.match(r'\p{Emoji_Presentation}', char)),
"is_emoji_modifier": bool(regex.match(r'\p{Emoji_Modifier}', char)),
"is_emoji_modifier_base": bool(regex.match(r'\p{Emoji_Modifier_Base}', char)),
"is_emoji_component": bool(regex.match(r'\p{Emoji_Component}', char)),
"is_extended_pictographic": bool(regex.match(r'\p{Extended_Pictographic}', char)),
}
print(get_emoji_properties("๐"))
# {
# 'code_point': 'U+1F44B',
# 'character': '๐',
# 'is_emoji': True,
# 'is_emoji_presentation': True,
# 'is_emoji_modifier': False,
# 'is_emoji_modifier_base': True,
# 'is_emoji_component': False,
# 'is_extended_pictographic': True
# }
print(get_emoji_properties("๐ฝ"))
# {
# 'code_point': 'U+1F3FD',
# 'is_emoji': True,
# 'is_emoji_modifier': True,
# 'is_emoji_component': True,
# ...
# }
JavaScript with Unicode Property Escapes
ES2018 introduced \p{} escapes in regex (requires the u flag):
const tests = {
emoji: /^\p{Emoji}$/u,
emojiPresentation: /^\p{Emoji_Presentation}$/u,
emojiModifier: /^\p{Emoji_Modifier}$/u,
emojiModifierBase: /^\p{Emoji_Modifier_Base}$/u,
emojiComponent: /^\p{Emoji_Component}$/u,
extendedPictographic: /^\p{Extended_Pictographic}$/u,
};
function getProperties(char) {
return Object.fromEntries(
Object.entries(tests).map(([k, rx]) => [k, rx.test(char)])
);
}
console.log(getProperties('๐ค'));
// { emoji: true, emojiPresentation: true, emojiModifierBase: true,
// emojiComponent: false, extendedPictographic: true, ... }
// Note: Emoji_Modifier_Base is NOT a standard \p{} escape in all engines
// Use a library for full coverage
Browser support for \p{Extended_Pictographic} is available in Chrome 64+, Firefox 78+, Safari 11.1+.
Practical Decision Guide
| Use case | Property to use |
|---|---|
| "Is this character typically shown as emoji?" | Emoji_Presentation |
| "Should I include reserved future emoji ranges?" | Extended_Pictographic |
| "Is this a skin tone modifier?" | Emoji_Modifier |
| "Can I apply a skin tone to this?" | Emoji_Modifier_Base |
| "Is this character part of a multi-code-point sequence?" | Emoji_Component |
| "Is this character in the Unicode emoji set at all?" | Emoji |
Property Relationships
The properties form a hierarchy:
Extended_Pictographic
โโโ Emoji_Presentation (subset of Extended_Pictographic)
โ โโโ most "real" emoji
โโโ Emoji (broader, includes text-default symbols)
โโโ Emoji_Modifier_Base (subset of Emoji)
โโโ Emoji_Modifier (subset of Emoji_Component)
Emoji_Component is orthogonal โ ZWJ has Emoji_Component=Yes but Emoji=No.
Validating Emoji Sequences
Valid emoji sequences are defined in emoji-sequences.txt and emoji-zwj-sequences.txt in the UCD. A code point having Emoji=Yes does not mean any combination of emoji code points is valid:
import regex
# A minimal valid sequence checker
VALID_EMOJI = regex.compile(
r'\p{Extended_Pictographic}\p{Emoji_Modifier}?' # base + optional modifier
r'(?:\uFE0F(?:\u20E3)?)?' # optional VS16 + keycap
r'(?:\u200D\p{Extended_Pictographic}\p{Emoji_Modifier}?(?:\uFE0F)?)*' # ZWJ chain
r'|[\U0001F1E0-\U0001F1FF]{2}' # flag pair
)
for seq in ["๐๐ฝ", "๐จโ๐ป", "๐บ๐ธ", "๐ป๐"]: # last one is invalid order
match = VALID_EMOJI.fullmatch(seq)
print(f"{seq}: {'valid' if match else 'invalid'}")
Explore More on EmojiFYI
- Inspect individual emoji sequences and their properties: Sequence Analyzer
- Compare emoji across platforms: Compare Tool
- Full glossary of Unicode emoji terms: Glossary
- Retrieve emoji property data via API: API Reference