Emojis in JavaScript: Why It's Tricky
JavaScript makes it easy to display emojis — just put them in a string. But processing emojiEmoji
Từ tiếng Nhật (絵文字) có nghĩa là 'ký tự hình ảnh' — các ký hiệu đồ họa nhỏ dùng trong giao tiếp kỹ thuật số để diễn đạt ý tưởng, cảm xúc và sự vật. strings correctly requires understanding some important quirks in how JavaScript handles UnicodeUnicode
Tiêu chuẩn mã hóa ký tự phổ quát gán một số duy nhất cho mỗi ký tự trong tất cả hệ thống chữ viết và bộ ký hiệu, bao gồm cả emoji. internally. If you've ever seen an emoji counted as 2 characters, or a string manipulation function split an emoji in half, this guide explains why and how to fix it.
Basic Emoji Strings in JavaScript
You can include emoji characters directly in JavaScript strings:
const greeting = "Hello 🌍"
const status = `Build complete ✅ — version ${version}`
console.log("🚀 Server starting...")
JavaScript strings are UTF-16UTF-16
Kiểu mã hóa Unicode có chiều rộng thay đổi, dùng 2 hoặc 4 byte cho mỗi ký tự, được JavaScript, Java và Windows dùng nội bộ. encoded internally. Most emojis are in the Unicode Supplementary Multilingual PlaneSupplementary Multilingual Plane (SMP)
Unicode Plane 1 (U+10000 đến U+1FFFF), nơi phần lớn các điểm mã emoji được phân bổ. (code points above U+FFFF) and require two UTF-16 code units — called a surrogate pairSurrogate Pair
Hai đơn vị mã UTF-16 (một surrogate cao U+D800-U+DBFF theo sau là một surrogate thấp U+DC00-U+DFFF) cùng nhau đại diện cho một ký tự trên U+FFFF. — to represent a single visible emoji.
Unicode Code Points in JavaScript
ES6+ \u{} Syntax (Recommended)
ES6 introduced a clean syntax for any Unicode code point:
const fire = "\u{1F525}" // 🔥
const grin = "\u{1F600}" // 😀
const heart = "\u{2764}\u{FE0F}" // ❤️ (two code points)
console.log(fire) // 🔥
Legacy Surrogate Pair Syntax (Pre-ES6)
Before ES6, you had to write the surrogate pair manually for high code points:
// 🔥 U+1F525 as surrogate pair
const fire = "\uD83D\uDD25"
console.log(fire) // 🔥
The ES6 \u{} syntax is far more readable and should be preferred in all modern code.
Getting an Emoji's Code Point
// codePointAt handles surrogate pairs correctly
const emoji = "🔥"
console.log(emoji.codePointAt(0)) // 128293 (decimal)
console.log(emoji.codePointAt(0).toString(16)) // "1f525" (hex)
// String.fromCodePoint creates from a code point
console.log(String.fromCodePoint(0x1F525)) // 🔥
console.log(String.fromCodePoint(128293)) // 🔥
The String Length Problem
The most common emoji pitfall in JavaScript is that .length counts UTF-16 code units, not visible characters.
// Basic emoji: 2 code units (surrogate pair)
console.log("🔥".length) // 2 — but it's 1 emoji
// Skin tone emoji: 4 code units (base + modifier)
console.log("👍🏽".length) // 4 — but it's 1 emoji
// ZWJZero Width Joiner (ZWJ)
Ký tự Unicode vô hình (U+200D) dùng để ghép nhiều emoji thành một emoji tổng hợp, chẳng hạn kết hợp người và vật thể thành emoji nghề nghiệp. sequence (woman technologist): 7 code units
console.log("👩💻".length) // 7 — but it's 1 emoji
// A simple ASCII character for comparison
console.log("A".length) // 1
Counting by Code Point (Better, But Still Not Perfect)
The spread operator and Array.from iterate by code point rather than code unitCode Unit
Tổ hợp bit tối thiểu dùng để mã hóa một ký tự: 8 bit cho UTF-8, 16 bit cho UTF-16 và 32 bit cho UTF-32.:
console.log([..."🔥"].length) // 1 ✓
console.log([..."👍🏽"].length) // 2 — base + skin modifier
console.log([..."👩💻"].length) // 3 — components of ZWJ sequence
console.log([..."👨👩👧👦"].length) // 7 — family ZWJ sequence
This is better than .length but still doesn't match visual character count for complex sequences.
Counting Grapheme Clusters (Correct)
The correct solution uses the Intl.Segmenter API (available in Node.js 16+ and all modern browsers):
function graphemeCount(str) {
const segmenter = new Intl.Segmenter()
return [...segmenter.segment(str)].length
}
console.log(graphemeCount("Hello")) // 5
console.log(graphemeCount("Hello 🔥")) // 7
console.log(graphemeCount("👩💻")) // 1 ✓
console.log(graphemeCount("👨👩👧👦")) // 1 ✓
console.log(graphemeCount("👍🏽")) // 1 ✓
For older environments, the grapheme-splitter npm package provides equivalent functionality:
npm install grapheme-splitter
import GraphemeSplitter from "grapheme-splitter"
const splitter = new GraphemeSplitter()
console.log(splitter.countGraphemes("👩💻")) // 1
Iterating Over Emoji Strings
Avoid for loops with index when iterating emoji strings — you'll split surrogate pairs:
// WRONG: splits surrogate pairs
const text = "Hi 🔥"
for (let i = 0; i < text.length; i++) {
console.log(text[i]) // splits 🔥 into two broken characters
}
// CORRECT: iterate by code point with for...of
for (const char of "Hi 🔥") {
console.log(char) // H, i, " ", 🔥
}
// CORRECT: use spread
const chars = [..."Hi 🔥"]
console.log(chars) // ["H", "i", " ", "🔥"]
For ZWJ sequences and skin tone modifiers, even for...of splits the components. Use Intl.Segmenter for truly correct grapheme-level iteration:
const text = "👩💻👍🏽🔥"
const segmenter = new Intl.Segmenter()
const segments = [...segmenter.segment(text)].map(s => s.segment)
console.log(segments) // ["👩💻", "👍🏽", "🔥"]
Regex and Emojis in JavaScript
The /u flag enables Unicode mode in JavaScript regex, allowing you to match full Unicode code points rather than UTF-16 code units:
// Without /u: matches one surrogate unit (broken)
/\uD83D/.test("🔥") // true (wrong — matches half the surrogate pair)
// With /u: matches the full code point
/\u{1F525}/u.test("🔥") // true (correct)
// Match any emoji in supplementary planePlane
Một nhóm gồm 65,536 điểm mã Unicode liên tiếp. Plane 0 là Basic Multilingual Plane (BMP); hầu hết emoji nằm ở Plane 1 (SMP).
const emojiRegex = /\p{Emoji}/u
console.log(emojiRegex.test("🔥")) // true
console.log(emojiRegex.test("A")) // false
Using the Unicode Property Escape \p{Emoji}
Modern JavaScript (ES2018+) supports Unicode property escapes:
// Match sequences of emoji characters
const emojiPattern = /\p{Emoji_Presentation}/gu
const text = "Hello 🌍! Great work ✅ today 🚀"
const emojis = text.match(emojiPattern)
console.log(emojis) // ["🌍", "✅", "🚀"]
// Remove all emojis from a string
const cleaned = text.replace(/\p{Emoji_Presentation}/gu, "").trim()
console.log(cleaned) // "Hello ! Great work today"
For comprehensive emoji matching including ZWJ sequences, use the emoji-regex npm package:
npm install emoji-regex
import emojiRegex from "emoji-regex"
const regex = emojiRegex()
const text = "Hello 👩💻 and 👍🏽!"
const matches = [...text.matchAll(regex)]
console.log(matches.map(m => m[0])) // ["👩💻", "👍🏽"]
Emojis in the DOM
In the browser, you can set emoji text content through standard DOM APIs:
// textContent is safe — emojis are just text
document.getElementById("status").textContent = "Build complete ✅"
// innerHTML works too, and you can use HTML entities
element.innerHTML = "Hello 🔥" // 🔥
// Creating elements with emojis
const btn = document.createElement("button")
btn.textContent = "🚀 Launch"
btn.setAttribute("aria-label", "Launch rocket")
document.body.appendChild(btn)
Emojis in React
React handles emoji strings without issues — they're just text:
// Inline emoji
function StatusBadge({ status }) {
const icons = {
success: "✅",
error: "❌",
pending: "⏳",
}
return <span>{icons[status]} {status}</span>
}
// Emoji in JSX with aria-label for accessibility
function EmojiIcon({ emoji, label }) {
return (
<span role="img" aria-label={label}>
{emoji}
</span>
)
}
// Usage
<EmojiIcon emoji="🚀" label="rocket" />
Practical Utility Functions
Check if a String Contains Emojis
function containsEmoji(str) {
return /\p{Emoji_Presentation}/u.test(str)
}
console.log(containsEmoji("Hello 🌍")) // true
console.log(containsEmoji("Hello")) // false
Extract All Emojis from a String
import emojiRegex from "emoji-regex"
function extractEmojis(str) {
const regex = emojiRegex()
return [...str.matchAll(regex)].map(m => m[0])
}
console.log(extractEmojis("I 🔥 love ❤️ JavaScript 🚀"))
// ["🔥", "❤️", "🚀"]
Truncate Text Preserving Emoji Integrity
function truncate(str, maxGraphemes) {
const segmenter = new Intl.Segmenter()
const segments = [...segmenter.segment(str)]
if (segments.length <= maxGraphemes) return str
return segments.slice(0, maxGraphemes).map(s => s.segment).join("") + "…"
}
console.log(truncate("Hello 👩💻 World! 🔥", 8))
// "Hello 👩💻 W…" — emoji counted as 1 character
Explore More on EmojiFYI
- Inspect the Unicode code points behind any emoji with the Sequence Analyzer
- Browse and copy emojis for your JavaScript strings with the Emoji Keyboard
- Learn about grapheme clusters, ZWJ sequences, and surrogate pairs in the Glossary
- Search for any emoji by name or keyword at EmojiFYI Search