Storing Emojis in Databases: MySQL, PostgreSQL, and SQLite

The Silent Data Loss Problem

EmojiEmoji
A Japanese word (絵文字) meaning 'picture character' — small graphical symbols used in digital communication to express ideas, emotions, and objects.
have caused more silent data corruption bugs than almost any other character type. A user submits a message containing 🎉, your server accepts it without error, but the database quietly truncates the row — or throws an opaque constraint error. The cause is almost always the same: the database is configured to use a character encoding that cannot represent emoji.

This guide covers the correct configuration for the three most widely used databases.

MySQL: The utf8mb4 Problem

MySQL's utf8 character set is not real UTF-8UTF-8
A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of websites).
. It only supports characters up to 3 bytes (U+0000 to U+FFFF). Since emoji code points start at U+1F300 and require 4 bytes in UTF-8, MySQL's utf8 silently drops or errors on them.

The fix is utf8mb4 — MySQL's name for actual UTF-8.

Server-Level Configuration

In my.cnf or my.ini:

[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

[client]
default-character-set = utf8mb4

Database and Table Level

-- Create database with utf8mb4
CREATE DATABASE myapp
  CHARACTER SET utf8mb4
  COLLATE utf8mb4_unicode_ci;

-- Alter existing database
ALTER DATABASE myapp
  CHARACTER SET utf8mb4
  COLLATE utf8mb4_unicode_ci;

-- Alter specific table
ALTER TABLE posts
  CONVERT TO CHARACTER SET utf8mb4
  COLLATE utf8mb4_unicode_ci;

-- Alter specific column
ALTER TABLE posts
  MODIFY content TEXT
  CHARACTER SET utf8mb4
  COLLATE utf8mb4_unicode_ci;

Collation Choice

Collation Notes
utf8mb4_unicode_ci Standard, good sorting, case-insensitive
utf8mb4_unicode_520_ci UnicodeUnicode
Universal character encoding standard that assigns a unique number to every character across all writing systems and symbol sets, including emoji.
5.2.0 rules
utf8mb4_0900_ai_ci MySQL 8.0+, Unicode 9.0, recommended for new projects
utf8mb4_bin Binary comparison, case-sensitive, fastest

For most applications, utf8mb4_unicode_ci (MySQL 5.x) or utf8mb4_0900_ai_ci (MySQL 8.x) is correct.

Connection String

Ensure your application sets utf8mb4 on the connection:

# Python (PyMySQL / mysqlclient)
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'OPTIONS': {
            'charset': 'utf8mb4',
        },
    }
}
// Node.js (mysql2)
const connection = mysql.createConnection({
  charset: 'utf8mb4',
  collation: 'utf8mb4_unicode_ci',
});

Checking Your Current Configuration

-- Check server defaults
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

-- Check a specific column
SELECT CHARACTER_SET_NAME, COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'myapp'
  AND TABLE_NAME = 'posts'
  AND COLUMN_NAME = 'content';

PostgreSQL: Emoji Work Out of the Box

PostgreSQL uses UTF-8 by default for new databases, and real UTF-8 that supports all 4-byte emoji. There is no utf8mb4 problem.

-- Create database (UTF8 is the default for most installations)
CREATE DATABASE myapp WITH ENCODING 'UTF8';

-- Verify encoding
SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = 'myapp';
-- UTF8

Column Types for Emoji Text

-- TEXT is unlimited-length UTF-8, handles all emoji
CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  content TEXT,
  title VARCHAR(255)  -- VARCHAR also handles emoji in PostgreSQL
);

The main PostgreSQL gotcha is column length limits with VARCHAR(n). The limit is in characters, not bytes, so VARCHAR(10) can hold 10 emoji (each up to 4 bytes). This is usually correct, but be aware when porting logic from MySQL where limits may have been set with byte-length assumptions.

Collation and Searching

-- Case-insensitive search including emoji
SELECT * FROM posts
WHERE content ILIKE '%🎉%';

-- Using pg_trgm for fuzzy matching (works with emoji)
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX ON posts USING gin(content gin_trgm_ops);

JSON Columns

PostgreSQL's jsonb type stores UTF-8 natively:

CREATE TABLE events (
  id SERIAL PRIMARY KEY,
  metadata JSONB
);

INSERT INTO events (metadata)
VALUES ('{"reaction": "🎉", "user": "alice"}');

SELECT metadata->>'reaction' FROM events;
-- 🎉

SQLite: Simple but Configuration-Dependent

SQLite stores text as UTF-8 by default, and it supports all Unicode code points including emoji. However, the behavior depends on how the SQLite library was compiled and how you connect to it.

import sqlite3

conn = sqlite3.connect('myapp.db')
conn.execute("PRAGMA encoding = 'UTF-8'")  # Default, but explicit is good

conn.execute("""
  CREATE TABLE IF NOT EXISTS posts (
    id INTEGER PRIMARY KEY,
    content TEXT
  )
""")

# Insert emoji — works fine in Python 3
conn.execute("INSERT INTO posts (content) VALUES (?)", ('Hello 🎉',))
conn.commit()

# Retrieve
row = conn.execute("SELECT content FROM posts LIMIT 1").fetchone()
print(row[0])  # Hello 🎉

SQLite Caveats

Text affinity: SQLite's type system uses affinities, not strict types. TEXT affinity stores any string as UTF-8. As long as you use parameterized queries (not string concatenation), emoji pass through cleanly.

LIKE is ASCII-only by default: SQLite's built-in LIKE is case-insensitive only for ASCII. For emoji-aware searching, use equality checks or the glob function, or enable the ICUICU (ICU)
International Components for Unicode — a widely-used open-source library providing Unicode and internationalization support, including emoji processing.
extension.

-- This works reliably for emoji matching
SELECT * FROM posts WHERE content = '🎉';

-- This may not work as expected for non-ASCII case folding
SELECT * FROM posts WHERE content LIKE '%🎉%';
-- (Works for emoji since emoji have no case, but be careful with letters)

Sizing Columns Correctly

A common mistake is sizing columns by byte length (since MySQL's older utf8 counted bytes). With utf8mb4 and PostgreSQL, VARCHAR(n) limits are in characters.

Emoji Type Code Points UTF-8 Bytes
Basic emoji (😀) 1 4
Emoji + variation selectorVariation Selector (VS)
Unicode characters (VS-15 U+FE0E and VS-16 U+FE0F) that modify whether a character renders in text (monochrome) or emoji (colorful) presentation.
(❤️)
2 7
Emoji + skin tone (👍🏽) 2 8
ZWJZero Width Joiner (ZWJ)
An invisible Unicode character (U+200D) used to join multiple emoji into a single composite emoji, such as combining people and objects into profession emoji.
sequence (👩‍💻)
3 11
Family emoji (👨‍👩‍👧‍👦) 7 25

If you store user reactions or status fields, VARCHAR(10) on code points is enough for most use cases. For freeform text, use TEXT without a length limit.

ORM Considerations

Django

Django automatically uses utf8mb4 for MySQL when you include 'charset': 'utf8mb4' in your database OPTIONS. For PostgreSQL, no special configuration is needed.

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'OPTIONS': {
            'charset': 'utf8mb4',
            'init_command': "SET sql_mode='STRICT_TRANS_TABLES'",
        },
    }
}

Sequelize (Node.js)

const sequelize = new Sequelize('database', 'username', 'password', {
  dialect: 'mysql',
  dialectOptions: {
    charset: 'utf8mb4',
  },
});

Quick Diagnostic

To check if your database can store emoji, run this SQL:

-- MySQL/PostgreSQL/SQLite
CREATE TEMP TABLE emoji_test (val TEXT);
INSERT INTO emoji_test VALUES ('🎉');
SELECT val FROM emoji_test;
DROP TABLE emoji_test;

If you see 🎉 in the result, you are correctly configured. If you see ? or an empty string, your encoding is wrong. If you get an error, check your connection charset setting.

Glossary Terms

Emoji Emoji
A Japanese word (絵文字) meaning 'picture character' — small graphical symbols used in digital communication to express ideas, emotions, and objects.
ICU (ICU) ICU (ICU)
International Components for Unicode — a widely-used open-source library providing Unicode and internationalization support, including emoji processing.
UTF-8 UTF-8
A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of websites).
Unicode Unicode
Universal character encoding standard that assigns a unique number to every character across all writing systems and symbol sets, including emoji.
Variation Selector (VS) Variation Selector (VS)
Unicode characters (VS-15 U+FE0E and VS-16 U+FE0F) that modify whether a character renders in text (monochrome) or emoji (colorful) presentation.
Zero Width Joiner (ZWJ) Zero Width Joiner (ZWJ)
An invisible Unicode character (U+200D) used to join multiple emoji into a single composite emoji, such as combining people and objects into profession emoji.

Related Stories