Technology Fundamentals
Character Encoding
Definition
Character encoding is a system that pairs a sequence of characters from a given set with something else, such as a sequence of natural numbers, octets, or electrical pulses, in order to facilitate the storage of text in computers and the transmission of text through telecommunication networks.
Why It Matters
Without a standard character encoding, a computer wouldn't know how to display text. It determines how the bytes of a text file are interpreted into the characters you see on screen.
Contextual Example
UTF-8 is the most common character encoding on the web. It can represent every character in the Unicode standard, including letters, symbols, and emojis from all languages, while remaining backward-compatible with ASCII.
Common Misunderstandings
- Seeing garbled text (like "â€" instead of a dash) is often a sign of a character encoding mismatch, where text saved in one encoding is read using another.
- ASCII was an early, simple encoding that only supported English characters. Unicode (and its implementation UTF-8) is the modern standard that supports all languages.