BOM

BOM

Encoding & Standards

The Byte Order Mark (U+FEFF) placed at the start of a text file to indicate byte order (endianness) in UTF-16/UTF-32 encodings.

The BOM is a special Unicode character used to signal the byte order of a text stream. In UTF-16, it distinguishes between little-endian (FF FE) and big-endian (FE FF) formats.

In UTF-8, a BOM (EF BB BF) is sometimes added but is not recommended — it can cause issues with scripts, JSON parsing, and Unix tools that don't expect it. Many text editors add a UTF-8 BOM by default, which can lead to subtle bugs.

Modern best practice: use UTF-8 without BOM for web content and data files.

Related Terms

UTF-16 UTF-16
A variable-width Unicode encoding that uses 2 or 4 bytes per character, used internally by JavaScript, Java, and Windows.
UTF-32 UTF-32
A fixed-width Unicode encoding that uses exactly 4 bytes per character, providing direct code point mapping at the cost of space.
UTF-8 UTF-8
A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of websites).

Related Tools

🔢 Unicode Lookup Unicode Lookup
Enter a codepoint like U+1F600 and get the emoji, encoding details, UTF-8/16 bytes, and HTML entities.