Embeddings

embeddings

Also called vector embedding/ text embedding/ semantic vector

An embedding is a fixed-length vector of numbers that represents a piece of text in a way that preserves meaning. Two pieces of text with similar meanings produce similar vectors. Used to power semantic search, retrieval, clustering, and the input layer of every transformer.

Photo: Luis Felipe Alburquerque Briganti / Pexels

The simplest version: take some text, run it through an embedding model, get back a vector of (typically) 768 to 3072 numbers. That vector is the text's "address" in a high-dimensional semantic space. "Cat" and "kitten" land close together in that space. "Cat" and "geopolitics" land far apart. Compute the cosine similarity between two vectors and you have a numerical measure of how related the underlying texts are.

Inside a transformer, the first layer is an embedding lookup: every token is mapped to a learned vector that captures what that token tends to mean in context. The whole rest of the network operates on those vectors. So embeddings are not a separate technology bolted on top of language models, they are the substrate language models think in.

Outside a transformer, dedicated embedding models (typically smaller transformers fine-tuned with a contrastive objective) are sold as APIs for retrieval. You take your corpus, embed every chunk, store the vectors in a vector database. At query time you embed the user's question, find the nearest chunks by cosine distance, and feed those chunks back to the language model as context. That pattern is retrieval-augmented generation, and embeddings are the index it depends on.

Embedding quality matters a lot for RAG. A bad embedding model returns the wrong chunks, and the language model dutifully answers from the wrong source. A good embedding model captures nuance: it knows that "How do I cancel my subscription?" should retrieve the cancellation policy chunk, not the marketing copy that uses the word "subscription" three times.

Embeddings are also how clustering, anomaly detection, and recommendation systems work in modern stacks. The pattern is the same: turn things into vectors, do geometry on the vectors.

Related concepts

Tokens

tokens

Retrieval-Augmented Generation (RAG)

rag

Transformer

transformer

Want the rest?

There are 40 terms total.

See the full glossary