One-Hot Encoding – Technology Term Explained | Technology Definitions

Definition

One-hot encoding is a process by which categorical variables are converted into a binary vector representation. Each category is represented by a vector where one element is "hot" (1) and all others are "cold" (0).

Why It Matters

Machine learning models cannot work with raw text data for categories. One-hot encoding is a standard way to convert categorical data into a numerical format that the model can understand, without implying any ordinal relationship between the categories.

Contextual Example

If you have a "color" feature with categories ["red", "green", "blue"], "red" would be encoded as `[1, 0, 0]`, "green" as `[0, 1, 0]`, and "blue" as `[0, 0, 1]`.

Common Misunderstandings

One-hot encoding can lead to very high-dimensional data if the number of categories is large (the "curse of dimensionality").
It is a common preprocessing step for categorical features.

Related Terms

Feature Engineering Embedding