๐Ÿ“– How-To Guides

How to Handle Emojis in JavaScript: Strings, Length, and Rendering

Emojis in JavaScript: Why It's Tricky

JavaScript makes it easy to display emojis โ€” just put them in a string. But processing emojiEmoji
A Japanese word (็ตตๆ–‡ๅญ—) meaning 'picture character' โ€” small graphical symbols used in digital communication to express ideas, emotions, and objects.
strings correctly requires understanding some important quirks in how JavaScript handles UnicodeUnicode
Universal character encoding standard that assigns a unique number to every character across all writing systems and symbol sets, including emoji.
internally. If you've ever seen an emoji counted as 2 characters, or a string manipulation function split an emoji in half, this guide explains why and how to fix it.

Basic Emoji Strings in JavaScript

You can include emoji characters directly in JavaScript strings:

const greeting = "Hello ๐ŸŒ"
const status = `Build complete โœ… โ€” version ${version}`
console.log("๐Ÿš€ Server starting...")

JavaScript strings are UTF-16UTF-16
A variable-width Unicode encoding that uses 2 or 4 bytes per character, used internally by JavaScript, Java, and Windows.
encoded internally. Most emojis are in the Unicode Supplementary Multilingual PlaneSupplementary Multilingual Plane (SMP)
Unicode Plane 1 (U+10000 to U+1FFFF), where the majority of emoji code points are allocated.
(code points above U+FFFF) and require two UTF-16 code units โ€” called a surrogate pairSurrogate Pair
Two UTF-16 code units (a high surrogate U+D800-U+DBFF followed by a low surrogate U+DC00-U+DFFF) that together represent a character above U+FFFF.
โ€” to represent a single visible emoji.

Unicode Code Points in JavaScript

ES6 introduced a clean syntax for any Unicode code pointCode Point
A unique numerical value assigned to each character in the Unicode standard, written in the format U+XXXX (e.g., U+1F600 for ๐Ÿ˜€).
:

const fire = "\u{1F525}"     // ๐Ÿ”ฅ
const grin = "\u{1F600}"     // ๐Ÿ˜€
const heart = "\u{2764}\u{FE0F}"  // โค๏ธ (two code points)

console.log(fire)  // ๐Ÿ”ฅ

Legacy Surrogate Pair Syntax (Pre-ES6)

Before ES6, you had to write the surrogate pair manually for high code points:

// ๐Ÿ”ฅ U+1F525 as surrogate pair
const fire = "\uD83D\uDD25"
console.log(fire)  // ๐Ÿ”ฅ

The ES6 \u{} syntax is far more readable and should be preferred in all modern code.

Getting an Emoji's Code Point

// codePointAt handles surrogate pairs correctly
const emoji = "๐Ÿ”ฅ"
console.log(emoji.codePointAt(0))           // 128293 (decimal)
console.log(emoji.codePointAt(0).toString(16))  // "1f525" (hex)

// String.fromCodePoint creates from a code point
console.log(String.fromCodePoint(0x1F525))  // ๐Ÿ”ฅ
console.log(String.fromCodePoint(128293))   // ๐Ÿ”ฅ

The String Length Problem

The most common emoji pitfall in JavaScript is that .length counts UTF-16 code units, not visible characters.

// Basic emoji: 2 code units (surrogate pair)
console.log("๐Ÿ”ฅ".length)     // 2 โ€” but it's 1 emoji

// Skin tone emoji: 4 code units (base + modifier)
console.log("๐Ÿ‘๐Ÿฝ".length)   // 4 โ€” but it's 1 emoji

// ZWJZero Width Joiner (ZWJ)
An invisible Unicode character (U+200D) used to join multiple emoji into a single composite emoji, such as combining people and objects into profession emoji.
sequence (woman technologist): 7 code units console.log("๐Ÿ‘ฉโ€๐Ÿ’ป".length) // 7 โ€” but it's 1 emoji // A simple ASCII character for comparison console.log("A".length) // 1

Counting by Code Point (Better, But Still Not Perfect)

The spread operator and Array.from iterate by code point rather than code unitCode Unit
The minimum bit combination used for encoding a character: 8-bit for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32.
:

console.log([..."๐Ÿ”ฅ"].length)         // 1 โœ“
console.log([..."๐Ÿ‘๐Ÿฝ"].length)       // 2 โ€” base + skin modifier
console.log([..."๐Ÿ‘ฉโ€๐Ÿ’ป"].length)      // 3 โ€” components of ZWJ sequence
console.log([..."๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ"].length)  // 7 โ€” family ZWJ sequence

This is better than .length but still doesn't match visual character count for complex sequences.

Counting Grapheme Clusters (Correct)

The correct solution uses the Intl.Segmenter API (available in Node.js 16+ and all modern browsers):

function graphemeCount(str) {
  const segmenter = new Intl.Segmenter()
  return [...segmenter.segment(str)].length
}

console.log(graphemeCount("Hello"))        // 5
console.log(graphemeCount("Hello ๐Ÿ”ฅ"))     // 7
console.log(graphemeCount("๐Ÿ‘ฉโ€๐Ÿ’ป"))        // 1 โœ“
console.log(graphemeCount("๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ"))    // 1 โœ“
console.log(graphemeCount("๐Ÿ‘๐Ÿฝ"))         // 1 โœ“

For older environments, the grapheme-splitter npm package provides equivalent functionality:

npm install grapheme-splitter
import GraphemeSplitter from "grapheme-splitter"

const splitter = new GraphemeSplitter()
console.log(splitter.countGraphemes("๐Ÿ‘ฉโ€๐Ÿ’ป"))  // 1

Iterating Over Emoji Strings

Avoid for loops with index when iterating emoji strings โ€” you'll split surrogate pairs:

// WRONG: splits surrogate pairs
const text = "Hi ๐Ÿ”ฅ"
for (let i = 0; i < text.length; i++) {
  console.log(text[i])  // splits ๐Ÿ”ฅ into two broken characters
}

// CORRECT: iterate by code point with for...of
for (const char of "Hi ๐Ÿ”ฅ") {
  console.log(char)  // H, i, " ", ๐Ÿ”ฅ
}

// CORRECT: use spread
const chars = [..."Hi ๐Ÿ”ฅ"]
console.log(chars)  // ["H", "i", " ", "๐Ÿ”ฅ"]

For ZWJ sequences and skin tone modifiers, even for...of splits the components. Use Intl.Segmenter for truly correct grapheme-level iteration:

const text = "๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘๐Ÿฝ๐Ÿ”ฅ"
const segmenter = new Intl.Segmenter()
const segments = [...segmenter.segment(text)].map(s => s.segment)
console.log(segments)  // ["๐Ÿ‘ฉโ€๐Ÿ’ป", "๐Ÿ‘๐Ÿฝ", "๐Ÿ”ฅ"]

Regex and Emojis in JavaScript

The /u flag enables Unicode mode in JavaScript regex, allowing you to match full Unicode code points rather than UTF-16 code units:

// Without /u: matches one surrogate unit (broken)
/\uD83D/.test("๐Ÿ”ฅ")  // true (wrong โ€” matches half the surrogate pair)

// With /u: matches the full code point
/\u{1F525}/u.test("๐Ÿ”ฅ")  // true (correct)

// Match any emoji in supplementary planePlane
A group of 65,536 consecutive Unicode code points. Plane 0 is the Basic Multilingual Plane (BMP); most emoji live in Plane 1 (SMP).
const emojiRegex = /\p{Emoji}/u console.log(emojiRegex.test("๐Ÿ”ฅ")) // true console.log(emojiRegex.test("A")) // false

Using the Unicode Property Escape \p{Emoji}

Modern JavaScript (ES2018+) supports Unicode property escapes:

// Match sequences of emoji characters
const emojiPattern = /\p{Emoji_Presentation}/gu

const text = "Hello ๐ŸŒ! Great work โœ… today ๐Ÿš€"
const emojis = text.match(emojiPattern)
console.log(emojis)  // ["๐ŸŒ", "โœ…", "๐Ÿš€"]

// Remove all emojis from a string
const cleaned = text.replace(/\p{Emoji_Presentation}/gu, "").trim()
console.log(cleaned)  // "Hello ! Great work  today"

For comprehensive emoji matching including ZWJ sequences, use the emoji-regex npm package:

npm install emoji-regex
import emojiRegex from "emoji-regex"

const regex = emojiRegex()
const text = "Hello ๐Ÿ‘ฉโ€๐Ÿ’ป and ๐Ÿ‘๐Ÿฝ!"
const matches = [...text.matchAll(regex)]
console.log(matches.map(m => m[0]))  // ["๐Ÿ‘ฉโ€๐Ÿ’ป", "๐Ÿ‘๐Ÿฝ"]

Emojis in the DOM

In the browser, you can set emoji text content through standard DOM APIs:

// textContent is safe โ€” emojis are just text
document.getElementById("status").textContent = "Build complete โœ…"

// innerHTML works too, and you can use HTML entities
element.innerHTML = "Hello &#x1F525;"  // ๐Ÿ”ฅ

// Creating elements with emojis
const btn = document.createElement("button")
btn.textContent = "๐Ÿš€ Launch"
btn.setAttribute("aria-label", "Launch rocket")
document.body.appendChild(btn)

Emojis in React

React handles emoji strings without issues โ€” they're just text:

// Inline emoji
function StatusBadge({ status }) {
  const icons = {
    success: "โœ…",
    error: "โŒ",
    pending: "โณ",
  }
  return <span>{icons[status]} {status}</span>
}

// Emoji in JSX with aria-label for accessibility
function EmojiIcon({ emoji, label }) {
  return (
    <span role="img" aria-label={label}>
      {emoji}
    </span>
  )
}

// Usage
<EmojiIcon emoji="๐Ÿš€" label="rocket" />

Practical Utility Functions

Check if a String Contains Emojis

function containsEmoji(str) {
  return /\p{Emoji_Presentation}/u.test(str)
}

console.log(containsEmoji("Hello ๐ŸŒ"))   // true
console.log(containsEmoji("Hello"))       // false

Extract All Emojis from a String

import emojiRegex from "emoji-regex"

function extractEmojis(str) {
  const regex = emojiRegex()
  return [...str.matchAll(regex)].map(m => m[0])
}

console.log(extractEmojis("I ๐Ÿ”ฅ love โค๏ธ JavaScript ๐Ÿš€"))
// ["๐Ÿ”ฅ", "โค๏ธ", "๐Ÿš€"]

Truncate Text Preserving Emoji Integrity

function truncate(str, maxGraphemes) {
  const segmenter = new Intl.Segmenter()
  const segments = [...segmenter.segment(str)]
  if (segments.length <= maxGraphemes) return str
  return segments.slice(0, maxGraphemes).map(s => s.segment).join("") + "โ€ฆ"
}

console.log(truncate("Hello ๐Ÿ‘ฉโ€๐Ÿ’ป World! ๐Ÿ”ฅ", 8))
// "Hello ๐Ÿ‘ฉโ€๐Ÿ’ป Wโ€ฆ" โ€” emoji counted as 1 character

Explore More on EmojiFYI

  • Inspect the Unicode code points behind any emoji with the Sequence Analyzer
  • Browse and copy emojis for your JavaScript strings with the Emoji Keyboard
  • Learn about grapheme clusters, ZWJ sequences, and surrogate pairs in the Glossary
  • Search for any emoji by name or keyword at EmojiFYI Search

Related Tools

โŒจ๏ธ Emoji Keyboard Emoji Keyboard
Browse and copy any of 3,953 emojis organized by category. Works in any browser, no install needed.
๐Ÿ” Sequence Analyzer Sequence Analyzer
Decode ZWJ sequences, skin tone modifiers, keycap sequences, and flag pairs into individual components.

Glossary Terms

Code Point Code Point
A unique numerical value assigned to each character in the Unicode standard, written in the format U+XXXX (e.g., U+1F600 for ๐Ÿ˜€).
Code Unit Code Unit
The minimum bit combination used for encoding a character: 8-bit for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32.
Emoji Emoji
A Japanese word (็ตตๆ–‡ๅญ—) meaning 'picture character' โ€” small graphical symbols used in digital communication to express ideas, emotions, and objects.
Plane Plane
A group of 65,536 consecutive Unicode code points. Plane 0 is the Basic Multilingual Plane (BMP); most emoji live in Plane 1 (SMP).
Supplementary Multilingual Plane (SMP) Supplementary Multilingual Plane (SMP)
Unicode Plane 1 (U+10000 to U+1FFFF), where the majority of emoji code points are allocated.
Surrogate Pair Surrogate Pair
Two UTF-16 code units (a high surrogate U+D800-U+DBFF followed by a low surrogate U+DC00-U+DFFF) that together represent a character above U+FFFF.
UTF-16 UTF-16
A variable-width Unicode encoding that uses 2 or 4 bytes per character, used internally by JavaScript, Java, and Windows.
Unicode Unicode
Universal character encoding standard that assigns a unique number to every character across all writing systems and symbol sets, including emoji.
Zero Width Joiner (ZWJ) Zero Width Joiner (ZWJ)
An invisible Unicode character (U+200D) used to join multiple emoji into a single composite emoji, such as combining people and objects into profession emoji.

Related Stories