Technology Fundamentals

UTF-8

Definition

UTF-8 is a variable-width character encoding used for electronic communication. It is the dominant character encoding for the World Wide Web, accounting for over 98% of all web pages.

Why It Matters

UTF-8 is the implementation of the Unicode standard that made it practical for the web. Its variable-width design is highly efficient: it uses only one byte for common English characters (just like ASCII) but can use up to four bytes to represent any other Unicode character.

Contextual Example

The English letter "A" is stored as one byte in UTF-8. The euro symbol "€" is stored as three bytes. The emoji "👍" is stored as four bytes. This flexibility makes it a universal solution.

Common Misunderstandings

  • UTF-8 is not a standard itself, but an encoding of the Unicode standard.
  • Its backward compatibility with ASCII was a key reason for its widespread adoption, as it allowed older systems to handle UTF-8 text without major changes.

Related Terms

Last Updated: December 17, 2025