UTF-32

Encoding & Standards

Encodage Unicode à largeur fixe utilisant exactement 4 octets par caractère, permettant un mappage direct des points de code au détriment de l'espace mémoire.

UTF-32 is the simplest Unicode encoding: every character uses exactly 4 bytes, and the value directly corresponds to the code point. This makes random access and character counting trivial.

However, UTF-32 uses 4x the memory of ASCII text and 2x that of UTF-16 for most common characters. It's rarely used for storage or transmission but can be convenient for internal string processing.

Python 3's internal string representation uses a variable-width encoding (Latin-1, UCS-2, or UCS-4) depending on the highest code point in the string, which is why `len('😀')` correctly returns 1.

Termes associés

Outils associés

Articles associés

Handling Emojis in JSON and REST APIs

How to correctly handle emoji in JSON serialization, REST API design, database storage, and HTTP headers — including surrogate pair pitfalls.

Emoji Encoding Guide: UTF-8, UTF-16 & Surrogate Pairs

A developer's guide to emoji encoding — how UTF-8, UTF-16, and UTF-32 represent emoji, and common pitfalls with surrogate pairs.

UTF-32

Embed This Widget

Termes associés

Outils associés

Articles associés