UTF-16

Encoding & Standards

Kiểu mã hóa Unicode có chiều rộng thay đổi, dùng 2 hoặc 4 byte cho mỗi ký tự, được JavaScript, Java và Windows dùng nội bộ.

UTF-16 uses 16-bit code units. Characters in the Basic Multilingual Plane (U+0000 to U+FFFF) use one code unit (2 bytes). Characters above U+FFFF — including most emoji — require a surrogate pair (4 bytes).

This is why JavaScript's `string.length` can be surprising with emoji: `'😀'.length` returns 2 (two UTF-16 code units), not 1. Developers must use spread syntax (`[...'😀'].length`) or `Array.from()` for correct counting.

UTF-16 exists in two byte orders: UTF-16LE (little-endian, used by Windows) and UTF-16BE (big-endian). A BOM character can indicate which is used.

Thuật ngữ liên quan

Công cụ liên quan

Bài viết liên quan

Unicode Lookup: Find Any Emoji by Code Point

Enter a Unicode code point like U+1F600 or paste an emoji to see its full encoding breakdown — UTF-8, UTF-16, HTML entities, and more.

What Are ZWJ Sequences? How Emoji Combine

Learn how Zero Width Joiner (ZWJ) sequences combine multiple emoji into one — from family emoji to professions to flags.

Emoji Sequence Analyzer: Decode ZWJ, Skin Tones, and Keycaps

Learn how to use EmojiFYI's Sequence Analyzer to break down ZWJ sequences, skin tone modifiers, keycap sequences, and flag emojis into their Unicode components.

How to Handle Emojis in JavaScript: Strings, Length, and Rendering

Learn how to work with emojis in JavaScript: string length pitfalls, Unicode code points, regex, grapheme segmentation, and React rendering tips.

Why string.length Lies: Grapheme Clusters and Emoji Length

Why string.length fails for emoji, what grapheme clusters are, and how to correctly count characters in JavaScript, Python, and more.

EmojiFYI Developer Tools: A Complete Overview

Explore all six EmojiFYI interactive tools built for developers: compare platforms, analyze sequences, access the API, convert text, and more.