Artificial Intelligence & Machine Learning
One-Hot Encoding
Definition
One-hot encoding is a process by which categorical variables are converted into a binary vector representation. Each category is represented by a vector where one element is "hot" (1) and all others are "cold" (0).
Why It Matters
Machine learning models cannot work with raw text data for categories. One-hot encoding is a standard way to convert categorical data into a numerical format that the model can understand, without implying any ordinal relationship between the categories.
Contextual Example
If you have a "color" feature with categories ["red", "green", "blue"], "red" would be encoded as `[1, 0, 0]`, "green" as `[0, 1, 0]`, and "blue" as `[0, 0, 1]`.
Common Misunderstandings
- One-hot encoding can lead to very high-dimensional data if the number of categories is large (the "curse of dimensionality").
- It is a common preprocessing step for categorical features.