Grapheme Cluster

Technical/Unicode

A user-perceived character that may be composed of multiple Unicode code points displayed as a single visual unit.

A grapheme cluster is what a user sees as "one character" on screen, even though it may be encoded as several code points. This concept is crucial for emoji because many emoji are composed of multiple code points.

For example, a flag emoji like ๐Ÿ‡ฐ๐Ÿ‡ท is two Regional Indicator code points. A person emoji with skin tone like ๐Ÿ‘๐Ÿฝ is two code points (the gesture + a modifier). ZWJ sequences can combine even more.

Programming languages differ in how they handle grapheme clusters. JavaScript's `.length` counts UTF-16 code units, so `'๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ง'.length` returns 8, not 1. Proper grapheme-aware APIs (like `Intl.Segmenter`) return the expected count of 1.

Related Terms

Code Point Code Point
A unique numerical value assigned to each character in the Unicode standard, written in the format U+XXXX (e.g., U+1F600 for ๐Ÿ˜€).
ICU (ICU) ICU (ICU)
International Components for Unicode โ€” a widely-used open-source library providing Unicode and internationalization support, including emoji processing.
Zero Width Joiner (ZWJ) Zero Width Joiner (ZWJ)
An invisible Unicode character (U+200D) used to join multiple emoji into a single composite emoji, such as combining people and objects into profession emoji.

Related Tools

๐Ÿ” Sequence Analyzer Sequence Analyzer
Decode ZWJ sequences, skin tone modifiers, keycap sequences, and flag pairs into individual components.
๐Ÿ”ข Unicode Lookup Unicode Lookup
Enter a codepoint like U+1F600 and get the emoji, encoding details, UTF-8/16 bytes, and HTML entities.