Emojis in URLs and Domain Names: Punycode, IDN, and Percent-Encoding

Emojis in URLs and Domain Names

EmojiEmoji
A Japanese word (絵文字) meaning 'picture character' — small graphical symbols used in digital communication to express ideas, emotions, and objects.
domains like 🍕.ws and emoji in URL paths are real — but the underlying technology involves multiple layers of encoding that can catch developers off guard. This guide explains how Punycode, Internationalized Domain Names (IDN), and percent-encoding interact with emoji characters.

Emoji in Domain Names

How IDN Works

The Domain Name System (DNS) was designed for ASCII only (letters, digits, and hyphens). Internationalized Domain Names (IDN) extend this to non-ASCII characters — including emoji — through Punycode encoding, defined in RFC 3492.

Every non-ASCII label in a domain name is converted to ASCII-Compatible Encoding (ACE) by:

  1. Mapping the UnicodeUnicode
    Universal character encoding standard that assigns a unique number to every character across all writing systems and symbol sets, including emoji.
    characters using IDNA (Internationalized Domain Names in Applications) rules
  2. Encoding the result as a Punycode string
  3. Prefixing with xn--

For example:

🍕.ws  →  xn--vi8h.ws
😀.com →  xn--h28h.com

IDNA Standards: 2003 vsVariation Selector (VS)
Unicode characters (VS-15 U+FE0E and VS-16 U+FE0F) that modify whether a character renders in text (monochrome) or emoji (colorful) presentation.
2008

There are two IDNA standards with important differences for emoji:

  • IDNA2003 (RFC 3490): Used Nameprep (based on Unicode 3.2). Many browsers initially followed this.
  • IDNA2008 (RFC 5891): Stricter rules. Emoji are categorized as DISALLOWED or UNASSIGNED, meaning emoji domains are technically invalid under IDNA2008.

In practice, most registrars that sell emoji domains use IDNA2003 or their own extensions. Browser behavior varies.

Converting to Punycode in Python

# Python's built-in codec handles Punycode
domain = "🍕.ws"
encoded = domain.encode("idna").decode("ascii")
print(encoded)  # xn--vi8h.ws

# Decode back
decoded = encoded.encode("ascii").decode("idna")
print(decoded)  # 🍕.ws

# For multi-label domains
full_domain = "🚀.🌍.example"
parts = full_domain.split(".")
encoded_parts = []
for part in parts:
    try:
        encoded_parts.append(part.encode("idna").decode("ascii"))
    except UnicodeError:
        encoded_parts.append(part)
encoded_domain = ".".join(encoded_parts)
print(encoded_domain)  # xn--yp8h.xn--54g.example

Converting to Punycode in JavaScript

// Node.js built-in
const { domainToASCII, domainToUnicode } = require('url');

console.log(domainToASCII('🍕.ws'));      // xn--vi8h.ws
console.log(domainToUnicode('xn--vi8h.ws')); // 🍕.ws

// Browser — use the URL API
const url = new URL('http://🍕.ws/');
console.log(url.hostname); // xn--vi8h.ws (browsers normalize to Punycode)

// For just encoding a label
const encoded = new URL(`http://${encodeURIComponent('🍕')}.ws/`).hostname;
// This won't work cleanly — use domainToASCII in Node or a library

Emoji Domains in Practice

Popular registrars that have offered emoji domain registration include .ws (Samoa), .fm (Micronesia), and .to (Tonga). Some gTLD operators do not allow emoji under their registry policies.

When building link detection or URL parsers, you must handle both the display form (🍕.ws) and the wire form (xn--vi8h.ws).

Emoji in URL Paths and Query Strings

Unlike domains, URL paths and query strings use percent-encoding (also called URL encoding), defined in RFC 3986.

Percent-Encoding Emoji

Emoji characters are encoded as their UTF-8UTF-8
A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of websites).
byte sequence, with each byte represented as %XX:

🎉 → UTF-8 bytes: F0 9F 8E 89
     Percent-encoded: %F0%9F%8E%89

🌍 → UTF-8 bytes: F0 9F 8C 8D
     Percent-encoded: %F0%9F%8C%8D

So the URL https://example.com/celebrate/🎉 becomes:

https://example.com/celebrate/%F0%9F%8E%89

Encoding in Python

from urllib.parse import quote, unquote, urlencode, quote_plus

emoji = "🎉"

# Encode for URL path (safe='/' preserves slashes)
path = f"/celebrate/{quote(emoji)}"
print(path)  # /celebrate/%F0%9F%8E%89

# Decode back
print(unquote("%F0%9F%8E%89"))  # 🎉

# For query parameters
params = {"emoji": "🎉🚀", "lang": "en"}
query = urlencode(params, quote_via=quote)
print(query)  # emoji=%F0%9F%8E%89%F0%9F%9A%80&lang=en

# Encode for HTML forms (spaces → +)
form_encoded = quote_plus("search 🔍")
print(form_encoded)  # search+%F0%9F%94%8D

Encoding in JavaScript

// encodeURIComponent encodes everything except unreserved chars
const emoji = "🎉";
const encoded = encodeURIComponent(emoji);
console.log(encoded); // %F0%9F%8E%89

// Decode
console.log(decodeURIComponent("%F0%9F%8E%89")); // 🎉

// Full URL with emoji path and query
const base = "https://example.com";
const path = `/tag/${encodeURIComponent("🚀tech")}`;
const params = new URLSearchParams({ q: "emoji 🔍", page: "1" });
const fullUrl = `${base}${path}?${params}`;
console.log(fullUrl);
// https://example.com/tag/%F0%9F%9A%80tech?q=emoji+%F0%9F%94%8D&page=1

// URL API handles encoding automatically
const url = new URL("https://example.com");
url.pathname = "/celebrate/🎉";
url.searchParams.set("q", "🔍 search");
console.log(url.toString());
// https://example.com/celebrate/%F0%9F%8E%89?q=%F0%9F%94%8D+search

Go

package main

import (
    "fmt"
    "net/url"
)

func main() {
    base := "https://example.com"
    emoji := "🎉"

    // Encode path segment
    encoded := url.PathEscape(emoji)
    fmt.Println(encoded) // %F0%9F%8E%89

    // Build full URL
    u, _ := url.Parse(base)
    u.Path = "/celebrate/" + emoji  // url.URL handles encoding on output
    q := u.Query()
    q.Set("tag", "🚀 rocket")
    u.RawQuery = q.Encode()
    fmt.Println(u.String())
    // https://example.com/celebrate/%F0%9F%8E%89?tag=%F0%9F%9A%80+rocket

    // Decode
    decoded, _ := url.PathUnescape("%F0%9F%8E%89")
    fmt.Println(decoded) // 🎉
}

Security Considerations

Homoglyph Attacks in Domains

While Punycode is transparent in the address bar of modern browsers (which display the Unicode form for known-safe scripts), mixing scripts can create spoofing opportunities. An emoji like 🅰️ (U+1F170, negative squared Latin A) looks like the letter A but encodes differently. This is less dangerous in domains than look-alike Latin characters, but worth being aware of in user-generated content.

Double-Encoding

A common bug is double-encoding emoji in URLs:

# WRONG: encoding an already-encoded string
encoded = quote("🎉")          # "%F0%9F%8E%89"
double = quote(encoded)        # "%25F0%259F%258E%2589" — wrong!

# RIGHT: encode once, decode once
encoded = quote("🎉")          # "%F0%9F%8E%89"
decoded = unquote(encoded)     # "🎉"

Django URL Patterns

Django automatically decodes percent-encoded URL paths before passing them to views. You receive the raw emoji character, not the encoded form:

# urls.py
from django.urls import path
from . import views

urlpatterns = [
    path("tag/<str:tag>/", views.tag_view),
]

# views.py
def tag_view(request, tag):
    # tag = "🚀" (already decoded by Django)
    print(repr(tag))  # '🚀'

Testing Emoji URLs

# curl handles emoji encoding automatically
curl -v "https://example.com/🎉"

# Or encode manually
curl -v "https://example.com/%F0%9F%8E%89"

# Test Punycode conversion
python3 -c "print('🍕.ws'.encode('idna').decode('ascii'))"

Explore More on EmojiFYI

Related Tools

🔍 Sequence Analyzer Sequence Analyzer
Decode ZWJ sequences, skin tone modifiers, keycap sequences, and flag pairs into individual components.

Glossary Terms

Emoji Emoji
A Japanese word (絵文字) meaning 'picture character' — small graphical symbols used in digital communication to express ideas, emotions, and objects.
UTF-8 UTF-8
A variable-width Unicode encoding that uses 1 to 4 bytes per character, dominant on the web (used by 98%+ of websites).
Unicode Unicode
Universal character encoding standard that assigns a unique number to every character across all writing systems and symbol sets, including emoji.

관련 이모지

Related Stories