UTF-8

Encoding & Standards

A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of websites).

UTF-8 is the most widely used Unicode encoding. ASCII characters (U+0000 to U+007F) use 1 byte, making it backward-compatible with ASCII. Characters beyond ASCII use 2-4 bytes.

Most emoji require 4 bytes in UTF-8 because they live in the Supplementary Multilingual Plane (code points above U+FFFF). For example, 😀 (U+1F600) encodes as 0xF0 0x9F 0x98 0x80.

UTF-8's dominance on the web (recommended by W3C, used by HTML5 by default) makes it the standard choice for storing and transmitting emoji in most applications.

Related Terms

BOM (BOM) BOM (BOM)
The Byte Order Mark (U+FEFF) placed at the start of a text file to indicate byte order (endianness) in UTF-16/UTF-32 encodings.
Code Unit Code Unit
The minimum bit combination used for encoding a character: 8-bit for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32.
UTF-16 UTF-16
A variable-width Unicode encoding that uses 2 or 4 bytes per character, used internally by JavaScript, Java, and Windows.
UTF-32 UTF-32
A fixed-width Unicode encoding that uses exactly 4 bytes per character, providing direct code point mapping at the cost of space.

Related Tools

🔢 Unicode Lookup Unicode Lookup
Enter a codepoint like U+1F600 and get the emoji, encoding details, UTF-8/16 bytes, and HTML entities.