Vector Databases (Pinecone, Milvus)

Storing and retrieving semantic data efficiently.

What Are Vector Databases?

Vector DB Architecture

Docs chunks Embed textโ†’vector Vector DB HNSW Index [0.2, 0.8, -0.1..] + metadata filters Top-K โ†’ LLM

Vector databases store and index high-dimensional vectors (embeddings) for fast similarity search. They are the backbone of every RAG system.

How They Work

  1. Embed: Convert your text into vectors using an embedding model (e.g., OpenAI text-embedding-3-small)
  2. Index: Store vectors in an optimized index structure (HNSW, IVF)
  3. Query: Convert the user's question into a vector, find the nearest neighbors

Popular Options

  • Pinecone: Managed, serverless. Great for getting started quickly.
  • Milvus / Zilliz: Open-source, highly scalable. Good for large datasets.
  • Weaviate: Open-source with built-in vectorization modules.
  • ChromaDB: Lightweight, great for prototyping and local development.
  • pgvector: PostgreSQL extension. Use your existing Postgres infrastructure.

Similarity Metrics

  • Cosine Similarity: Most common. Measures angle between vectors. Best for text.
  • Euclidean Distance: Measures straight-line distance. Good for spatial data.
  • Dot Product: Fast computation. Works when vectors are normalized.

Code Example

Using ChromaDB for local vector storage. Embed documents with OpenAI, store in Chroma, and query by semantic similarity.

python
1from openai import OpenAI
2import chromadb
3
4client = OpenAI()
5chroma = chromadb.Client()
6
7# Create a collection
8collection = chroma.create_collection("docs")
9
10# Add documents
11docs = [
12    "RAG improves LLM accuracy by providing relevant context.",
13    "Vector databases store embeddings for fast similarity search.",
14    "Chunking strategy significantly affects retrieval quality.",
15]
16
17# Embed and store
18for i, doc in enumerate(docs):
19    embedding = client.embeddings.create(
20        input=doc, model="text-embedding-3-small"
21    ).data[0].embedding
22    
23    collection.add(
24        embeddings=[embedding],
25        documents=[doc],
26        ids=[f"doc_{i}"]
27    )
28
29# Query
30query = "How does RAG work?"
31query_embedding = client.embeddings.create(
32    input=query, model="text-embedding-3-small"
33).data[0].embedding
34
35results = collection.query(query_embeddings=[query_embedding], n_results=2)
36print(results["documents"])

Use Cases

Semantic search over internal documentation
Product recommendation engines
Duplicate detection in support tickets
Image search using CLIP embeddings

Common Mistakes

Using the wrong similarity metric for your embedding model
Not normalizing vectors when using dot product similarity
Storing too many metadata fields which slows down queries
Choosing a managed solution when pgvector would suffice for your scale

Interview Insight

Relevance

High - Core RAG infrastructure

AI Tutor

Ask about the topic

Sign in Required

Please sign in to use the AI tutor

Sign In