Analizador de secuencias

Decodifica secuencias ZWJ, modificadores de tono de piel, secuencias de teclas y pares de banderas en sus componentes individuales.

Checker

How to Use

  1. 1
    Paste an emoji sequence

    Paste any emoji — including complex ZWJ sequences, skin-tone-modified emoji, flag sequences, or keycap sequences — into the analyzer field. The tool accepts raw emoji characters, Unicode escape sequences (\u{1F468}), and mixed strings.

  2. 2
    Read the codepoint breakdown

    Review the full decomposition showing each codepoint in the sequence, its Unicode name, its hex value, its UTF-8 byte representation, and its role in the sequence (base character, modifier, ZWJ, variation selector, or tag character).

  3. 3
    Check sequence type and RGI status

    Verify whether the sequence is classified as fully qualified, minimally qualified, or unqualified in the Unicode emoji-test.txt file, and whether it appears in the official RGI (Recommended for General Interchange) list. Non-RGI sequences may not render as a single glyph on all platforms.

About

Modern emoji are frequently not single codepoints but sequences — ordered combinations of multiple Unicode characters that a text rendering engine combines into a single displayed glyph. Understanding emoji at the sequence level is essential for any application that processes, stores, counts, or manipulates emoji-containing text. The Unicode Standard defines four main sequence types: modifier sequences (emoji + skin tone modifier), flag sequences (two Regional Indicator letters), tag sequences (black flag + tag characters), and ZWJ sequences (emoji + Zero Width Joiner + emoji, repeated as needed).

The Zero Width Joiner (U+200D) is the most powerful and flexible composition mechanism in the emoji system. Its original purpose was to request ligature formation in scripts like Arabic and Indic, but the emoji ecosystem adopted it to create composite meanings from existing characters without requiring new codepoints. The Unicode Emoji Subcommittee maintains a curated list of RGI (Recommended for General Interchange) ZWJ sequences in emoji-zwj-sequences.txt; only these combinations are guaranteed to render as unified glyphs on compliant platforms. Unofficial ZWJ combinations are technically valid Unicode but will render as separate emoji on most systems.

For developers, correct sequence handling requires a Unicode-aware grapheme cluster segmentation implementation (defined in Unicode Technical Report #29). Naively splitting emoji strings by codepoint or UTF-16 code unit will incorrectly fragment sequences, producing broken emoji or incorrect character counts. A string containing the rainbow flag 🏳️‍🌈 has 4 codepoints but should be treated as exactly 1 grapheme cluster for purposes of cursor movement, selection, copy/paste, and character counting in user-facing interfaces.

FAQ

What is a ZWJ sequence and how does it work?
A ZWJ (Zero Width Joiner) sequence is an emoji formed by combining two or more emoji codepoints with U+200D (Zero Width Joiner) characters between them. The ZWJ signals to the rendering engine that adjacent emoji should be displayed as a single composite glyph if the platform supports that specific combination. For example, 👨‍🍳 (man cook) is the sequence U+1F468 + U+200D + U+1F373. If a platform does not support a given ZWJ combination, it simply ignores the ZWJ and renders each emoji separately. Only sequences listed in Unicode's emoji-zwj-sequences.txt are considered RGI (Recommended for General Interchange) and have broad platform support.
How do skin tone modifiers work at the codepoint level?
Skin tone modifiers are five codepoints in the Miscellaneous Symbols and Pictographs block: U+1F3FB through U+1F3FF, corresponding to the Fitzpatrick scale Types I-II through V-VI. When appended immediately after an emoji with the Emoji_Modifier_Base property (defined in emoji-data.txt), the modifier requests that the base emoji render in the corresponding skin tone. In ZWJ family sequences, multiple modifier codepoints can appear — one per person emoji — allowing sequences like 👨🏽‍🤝‍👩🏻 (a handshake with two different skin tones) introduced in Emoji 12.1. Parsing these sequences correctly requires understanding that each modifier binds to the immediately preceding modifier-base character.
What are tag character sequences used for?
Tag characters (U+E0020 through U+E007F, plus the tag terminator U+E007F) are used exclusively for subdivision flag sequences — currently the flags of England (🏴󠁧󠁢󠁥󠁮󠁧󠁿), Scotland (🏴󠁧󠁢󠁳󠁣󠁴󠁿), and Wales (🏴󠁧󠁢󠁷󠁬󠁳󠁿). These sequences begin with the black flag emoji (U+1F3F4), followed by tag characters encoding the ISO 3166-2 subdivision code in ASCII, and ending with a tag terminator. Tag characters are invisible in other contexts and have no independent visible rendering — their only defined use in modern Unicode is within these flag sequences.
What is the difference between fully qualified and minimally qualified emoji?
Unicode's emoji-test.txt file distinguishes three qualification levels for emoji sequences. 'Fully qualified' means the sequence includes all recommended variation selectors (primarily U+FE0F for emoji presentation) and is the form recommended for use in text. 'Minimally qualified' means variation selectors are omitted but the sequence is otherwise complete. 'Unqualified' means the sequence is missing components beyond variation selectors. Fully qualified forms are what major platforms transmit and what copy operations typically produce. Using minimally qualified or unqualified forms may cause inconsistent rendering, especially for symbols that exist in both text and emoji presentation.
How can I detect if a string contains emoji programmatically?
Detecting emoji programmatically requires checking codepoint properties defined in Unicode's emoji-data.txt. The most reliable approach is to check the 'Emoji' property for individual characters and use a Unicode-aware segmentation algorithm (per UTR #29) to identify emoji grapheme clusters, which handles sequences. In Python, the `emoji` package or checking codepoint ranges works for simple cases, but sequence-aware libraries like `emoji` or `grapheme` are needed for ZWJ sequences. JavaScript's Intl.Segmenter (available in modern browsers) correctly segments emoji sequences into grapheme clusters using Unicode Text Segmentation rules. Regular expression approaches using \p{Emoji} are supported in modern regex engines but require careful handling of variation selectors to avoid false positives.