If you're building anything with LLMs, you've probably heard "you need a vector database" about fifty times. But the explanations usually jump straight into mathematical jargon about high-dimensional spaces and cosine similarity. Let me try to explain this simply, from the perspective of a developer who just wanted to build a RAG system and had to figure this stuff out.
What Problem They Solve
Regular databases are great at exact matching. "Find all users where email = 'test@example.com'." Fast, simple, done.
But what if you want to find things that are similar rather than identical? "Find documents that are about the same topic as this query." Or "find products similar to this one." Regular databases can't do this efficiently.
Vector databases solve this by storing data as vectors (arrays of numbers) and finding items that are mathematically "close" to each other. Two pieces of text about the same topic will have similar vectors, even if they use completely different words.
How It Works (the Short Version)
You take your data (text, images, whatever) and convert it into a vector using an embedding model. OpenAI's text-embedding-ada-002 is the most common choice for text. It takes a string and returns an array of 1536 numbers. Those numbers capture the semantic meaning of the text.
You store those vectors in the database. When a user asks a question, you embed the question into a vector using the same model, then ask the database: "find me the 5 stored vectors closest to this query vector." The database returns the most semantically similar items.
That's it. That's the core concept. Everything else is optimization details.
The Options I've Used
Chroma. Open source, runs locally, dead simple to set up. I use this for prototyping and small projects. You can pip install it and have a working vector store in 5 lines of Python. The downside: it's an in-memory database by default, so it doesn't scale to millions of documents and your data is gone if the process dies (though it has a persistence option).
Pinecone. Managed service, no infrastructure to worry about. You get an API endpoint and just send vectors to it. Scales well, fast queries, good documentation. The downside: it's a hosted service (your data is on their servers) and it costs money. The free tier is generous for experimentation but production use adds up.
Weaviate. Open source, self-hostable, or available as a managed service. More features than Chroma (hybrid search, filtering, multi-modal). The learning curve is steeper but it's more capable. I'd use this for production applications where you need more control than Pinecone offers but more features than Chroma provides.
When You Actually Need One
You need a vector database when you're building RAG (retrieval augmented generation) systems. This is the pattern where you give an LLM context from your own data by finding relevant documents for each query. Without a vector database, you'd have to send all your documents to the LLM every time, which is expensive and hits context limits.
You also need one for semantic search (search by meaning, not keywords), recommendation systems (find similar items), and deduplication (find near-duplicate content).
When You Don't Need One
This is the part nobody talks about. If your dataset is small (under a few thousand documents), you can just compute similarity in memory without a specialized database. Embed all your documents, store the vectors in a regular array, and use brute-force cosine similarity to find matches. It's fast enough up to a few thousand vectors and way simpler than setting up infrastructure.
If your search needs are keyword-based rather than semantic, use a regular full-text search engine like Elasticsearch or even PostgreSQL's built-in text search. Vector databases are for semantic similarity. Don't use them for exact or keyword matching.
If you're building a chatbot that answers questions about a small knowledge base (under 100 pages), you might not need a vector database at all. You could stuff the entire knowledge base into the LLM's context window, especially with models that support 100K+ tokens.
Practical Tips from Building with These
Chunk size matters more than you think. How you split your documents before embedding them dramatically affects retrieval quality. Too small and you lose context. Too large and the embedding becomes too general. I've found 200-500 tokens to be a sweet spot for most text, but experiment with your data.
Metadata filtering is essential. Store metadata alongside your vectors (source, date, category) and filter on it. "Find relevant documents from the last 30 days" is a common need and you don't want to rely on the vector similarity alone for this.
Embedding model choice is more important than database choice. The quality of your embeddings determines the quality of your retrieval. OpenAI's ada-002 is the default choice and it's good. But test alternatives if retrieval quality isn't meeting your needs.
Start with Chroma. Seriously. Build your prototype with the simplest option. Only upgrade to Pinecone or Weaviate when you hit a real limitation. I've seen too many projects spend weeks setting up infrastructure for a problem that could be solved with an in-memory array.
Vector databases are a genuinely useful tool for AI applications. But like any tool, using them when you don't need them just adds complexity. Understand the problem they solve, assess whether you have that problem, and choose the simplest option that works.