UTF-32

Encoding & Standards

การเข้ารหัส Unicode แบบความกว้างคงที่ ใช้ 4 ไบต์ต่ออักขระพอดี ทำให้แมปโค้ดพอยท์ได้โดยตรงแต่ใช้พื้นที่มากกว่า

UTF-32 is the simplest Unicode encoding: every character uses exactly 4 bytes, and the value directly corresponds to the code point. This makes random access and character counting trivial.

However, UTF-32 uses 4x the memory of ASCII text and 2x that of UTF-16 for most common characters. It's rarely used for storage or transmission but can be convenient for internal string processing.

Python 3's internal string representation uses a variable-width encoding (Latin-1, UCS-2, or UCS-4) depending on the highest code point in the string, which is why `len('😀')` correctly returns 1.

คำที่เกี่ยวข้อง

เครื่องมือที่เกี่ยวข้อง

บทความที่เกี่ยวข้อง

Handling Emojis in JSON and REST APIs

How to correctly handle emoji in JSON serialization, REST API design, database storage, and HTTP headers — including surrogate pair pitfalls.

Emoji Encoding Guide: UTF-8, UTF-16 & Surrogate Pairs

A developer's guide to emoji encoding — how UTF-8, UTF-16, and UTF-32 represent emoji, and common pitfalls with surrogate pairs.

UTF-32

Embed This Widget

คำที่เกี่ยวข้อง

เครื่องมือที่เกี่ยวข้อง

บทความที่เกี่ยวข้อง