Emoji Regex Patterns: Matching Emojis in JavaScript and Python

Why EmojiEmoji
Từ tiếng Nhật (絵文字) có nghĩa là 'ký tự hình ảnh' — các ký hiệu đồ họa nhỏ dùng trong giao tiếp kỹ thuật số để diễn đạt ý tưởng, cảm xúc và sự vật. Regex Is Hard

Writing a regex that correctly matches emoji is surprisingly difficult. A single visible emoji like 👨‍👩‍👧‍👦 (family) is composed of 7 UnicodeUnicode
Tiêu chuẩn mã hóa ký tự phổ quát gán một số duy nhất cho mỗi ký tự trong tất cả hệ thống chữ viết và bộ ký hiệu, bao gồm cả emoji. code points joined by invisible characters. A regex that matches a single character will match fragments of emoji, or miss them entirely.

The root problems are:

Variable length: emoji range from 1 code point (😀) to 10+ code points (complex ZWJZero Width Joiner (ZWJ)
Ký tự Unicode vô hình (U+200D) dùng để ghép nhiều emoji thành một emoji tổng hợp, chẳng hạn kết hợp người và vật thể thành emoji nghề nghiệp. sequences)
Surrogate pairs: in UTF-16UTF-16
Kiểu mã hóa Unicode có chiều rộng thay đổi, dùng 2 hoặc 4 byte cho mỗi ký tự, được JavaScript, Java và Windows dùng nội bộ. environments (JavaScript), each emoji above U+FFFF is two code units
Combining characters: variation selectors, skin tone modifiers, and ZWJ are invisible but part of the emoji
Evolving standard: new emoji are added each Unicode release, so hardcoded ranges go stale

JavaScript: Using the Unicode Flag

The u flag enables Unicode mode in JavaScript regex, making . match a full code point rather than a single UTF-16 code unitCode Unit
Tổ hợp bit tối thiểu dùng để mã hóa một ký tự: 8 bit cho UTF-8, 16 bit cho UTF-16 và 32 bit cho UTF-32..

// Without u flag: . matches one code unit (breaks emoji)
/^.$/.test('😀')   // false — emoji is 2 code units
/^.$/u.test('😀')  // true — u flag treats it as one code point

// Match any single emoji code point (basic, not sequences)
const basicEmoji = /\p{Emoji}/u;
basicEmoji.test('Hello 😀')  // true

// The v flag (ES2024) adds set operations and is stricter
const emojiV = /[\p{Emoji}--\p{Number}]/v;

Matching Full Emoji Grapheme Clusters

To match complete emoji including ZWJ sequences and skin tones, you need a pattern that handles all the components:

// Comprehensive emoji regex (covers most cases)
const emojiRegex = /\p{Emoji_Modifier_Base}\p{Emoji_Modifier}|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu;

// Even better: use the emoji-regex npm package
// import emojiRegex from 'emoji-regex';
// const re = emojiRegex();

// Example usage
const text = 'Hello 👋 World 🌍 from 👨‍💻';
const matches = text.match(emojiRegex);
// ['👋', '🌍', '👨‍💻']  ← note: ZWJ sequence captured as one match

The `emoji-regex` Package

For production use, the emoji-regex npm package by Mathias Bynens generates a regex from the Unicode data and handles all edge cases:

import emojiRegex from 'emoji-regex';

const re = emojiRegex();
const str = '💃🏽 dancing and 🚀 launching';

let match;
while ((match = re.exec(str)) !== null) {
  console.log(`Found: ${match[0]} at index ${match.index}`);
}
// Found: 💃🏽 at index 0
// Found: 🚀 at index 14

Python: The `emoji` Library and Regex

Python 3 handles code points natively — '😀' has length 1. But matching emoji sequences still requires care.

Using Unicode Property Escapes with `regex`

The built-in re module does not support Unicode property escapes. Install the regex module instead:

import regex

# Match emoji with Unicode property escapes
pattern = regex.compile(r'\p{Emoji}', regex.UNICODE)
pattern.findall('Hello 😀 World 🌍')
# ['😀', '🌍']

# Match grapheme clusters (handles ZWJ sequences)
grapheme_pattern = regex.compile(r'\X', regex.UNICODE)
grapheme_pattern.findall('👩‍💻 coding')
# ['👩‍💻', ' ', 'c', 'o', 'd', 'i', 'n', 'g']

The \X pattern matches a full Unicode grapheme cluster — the correct unit for "one visible character."

Using the `emoji` Library

For higher-level emoji operations, the emoji library is excellent:

import emoji

# Find all emoji in text
text = 'I love 🐍 Python and ☕ coffee'
emoji.emoji_list(text)
# [{'match_start': 7, 'match_end': 8, 'emoji': '🐍'},
#  {'match_start': 19, 'match_end': 20, 'emoji': '☕'}]

# Check if string is entirely emoji
emoji.is_emoji('😀')   # True
emoji.is_emoji('hello') # False

# Count distinct emoji
emoji.emoji_count('🐍🐍🐍')  # 3
emoji.emoji_count('🐍🐍🐍', unique=True)  # 1

Matching Specific Emoji Subsets

Flags Only

Country flags are Regional IndicatorRegional Indicator (RI)
Các chữ cái Unicode ghép đôi (U+1F1E6 đến U+1F1FF) tạo thành emoji cờ quốc gia khi kết hợp theo mã ISO 3166-1 alpha-2. Symbol pairs (U+1F1E6–U+1F1FF):

// Match flag emoji (two regional indicator letters)
const flagRegex = /[\u{1F1E6}-\u{1F1FF}]{2}/gu;
'I am from 🇩🇪 and you from 🇺🇸'.match(flagRegex);
// ['🇩🇪', '🇺🇸']

import regex

flag_pattern = regex.compile(r'[\U0001F1E6-\U0001F1FF]{2}')
flag_pattern.findall('Visiting 🇯🇵 and 🇰🇷')
# ['🇯🇵', '🇰🇷']

Keycap Sequences

Keycaps like 0️⃣ through 9️⃣ follow the pattern: digit + U+FE0F + U+20E3:

const keycapRegex = /[0-9#*]\uFE0F\u20E3/gu;
'Press 1️⃣ or 2️⃣'.match(keycapRegex);
// ['1️⃣', '2️⃣']

Common Mistakes

Mistake 1: Using . without the u flag in JavaScript. It matches one code unit, splitting emoji.

Mistake 2: Checking str.length > 0 to detect emoji content. An emoji-only string can have .length of 8 or more.

Mistake 3: Using character class ranges like [\u0080-\uFFFF] — this misses most modern emoji above U+FFFF and produces false positives for non-emoji Unicode characters.

Mistake 4: Forgetting variation selectorVariation Selector (VS)
Các ký tự Unicode (VS-15 U+FE0E và VS-16 U+FE0F) xác định xem một ký tự được hiển thị dưới dạng văn bản (đơn sắc) hay emoji (có màu). U+FE0F. The character ❤ (U+2764) without VS16 is a text symbol; ❤️ with U+FE0F is the emoji presentationEmoji Presentation
Cách hiển thị mặc định của một ký tự dưới dạng glyph emoji có màu, hoặc vốn có hoặc được kích hoạt bởi Variation Selector-16..

Testing Your Pattern

Use our Sequence Analyzer to inspect any emoji's code points, then test your regex against it to verify full matches. Always test against ZWJ sequences, skin tone variants, and flag emoji before shipping emoji-handling code.

Emoji Regex Patterns: Matching Emojis in JavaScript and Python

Embed This Widget

Why EmojiEmoji
Từ tiếng Nhật (絵文字) có nghĩa là 'ký tự hình ảnh' — các ký hiệu đồ họa nhỏ dùng trong giao tiếp kỹ thuật số để diễn đạt ý tưởng, cảm xúc và sự vật. Regex Is Hard

JavaScript: Using the Unicode Flag

Matching Full Emoji Grapheme Clusters

The `emoji-regex` Package

Python: The `emoji` Library and Regex

Using Unicode Property Escapes with `regex`

Using the `emoji` Library

Matching Specific Emoji Subsets

Flags Only

Keycap Sequences

Common Mistakes

Testing Your Pattern

Công cụ liên quan

Thuật ngữ

Danh mục Emoji liên quan

Emoji liên quan

Bài viết liên quan

What Are ZWJ Sequences? How Emoji Combine

Unicode Normalization Forms: NFC, NFD, NFKC, NFKD Explained

Unicode Emoji Properties: Extended_Pictographic, Emoji_Presentation, and More

Text vs Emoji Presentation Selectors: VS15 (U+FE0E) and VS16 (U+FE0F)

Why EmojiEmojiTừ tiếng Nhật (絵文字) có nghĩa là 'ký tự hình ảnh' — các ký hiệu đồ họa nhỏ dùng trong giao tiếp kỹ thuật số để diễn đạt ý tưởng, cảm xúc và sự vật. Regex Is Hard

JavaScript: Using the Unicode Flag

Matching Full Emoji Grapheme Clusters

The emoji-regex Package

Python: The emoji Library and Regex

Using Unicode Property Escapes with regex

Using the emoji Library

Matching Specific Emoji Subsets

Flags Only

Keycap Sequences

Common Mistakes

Testing Your Pattern

Công cụ liên quan

Thuật ngữ

Danh mục Emoji liên quan

Emoji liên quan

Bài viết liên quan

What Are ZWJ Sequences? How Emoji Combine

Unicode Normalization Forms: NFC, NFD, NFKC, NFKD Explained

Unicode Emoji Properties: Extended_Pictographic, Emoji_Presentation, and More

Text vs Emoji Presentation Selectors: VS15 (U+FE0E) and VS16 (U+FE0F)

Why EmojiEmoji
Từ tiếng Nhật (絵文字) có nghĩa là 'ký tự hình ảnh' — các ký hiệu đồ họa nhỏ dùng trong giao tiếp kỹ thuật số để diễn đạt ý tưởng, cảm xúc và sự vật. Regex Is Hard

The `emoji-regex` Package

Python: The `emoji` Library and Regex

Using Unicode Property Escapes with `regex`

Using the `emoji` Library