UTF-32

Encoding & Standards

Kiểu mã hóa Unicode có chiều rộng cố định, dùng đúng 4 byte cho mỗi ký tự, cho phép ánh xạ trực tiếp điểm mã nhưng tốn nhiều bộ nhớ hơn.

UTF-32 is the simplest Unicode encoding: every character uses exactly 4 bytes, and the value directly corresponds to the code point. This makes random access and character counting trivial.

However, UTF-32 uses 4x the memory of ASCII text and 2x that of UTF-16 for most common characters. It's rarely used for storage or transmission but can be convenient for internal string processing.

Python 3's internal string representation uses a variable-width encoding (Latin-1, UCS-2, or UCS-4) depending on the highest code point in the string, which is why `len('😀')` correctly returns 1.

Thuật ngữ liên quan

Công cụ liên quan

Bài viết liên quan

Handling Emojis in JSON and REST APIs

How to correctly handle emoji in JSON serialization, REST API design, database storage, and HTTP headers — including surrogate pair pitfalls.

Emoji Encoding Guide: UTF-8, UTF-16 & Surrogate Pairs

A developer's guide to emoji encoding — how UTF-8, UTF-16, and UTF-32 represent emoji, and common pitfalls with surrogate pairs.

UTF-32

Embed This Widget

Thuật ngữ liên quan

Công cụ liên quan

Bài viết liên quan