Databases & Data Storage
Inverted Index
Definition
An inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. It is the most popular data structure used in full-text search engines.
Why It Matters
The inverted index is what makes full-text search fast. Instead of scanning every document for a word, the search engine can look up the word in the index and instantly get a list of all the documents that contain it.
Contextual Example
An inverted index for a set of documents would look like a real-world book index. It would have a list of all unique words, and next to each word, a list of the documents and positions where that word appears.
Common Misunderstandings
- The process of building an inverted index involves tokenizing text, normalizing words (e.g., to lowercase), and removing stop words.
- It is the core data structure behind search engines like Elasticsearch and Lucene.