Vector Databases Explained | Pillai Infotech LLP

Q: Can I just use Elasticsearch for vector search?

It works for moderate-scale use cases. Dedicated vector databases are 2-5x faster for pure vector search. For hybrid search, both Elasticsearch and Weaviate are good options.

Q: How much does a vector database cost to run?

pgvector is free. Pinecone starts at $70/month for production. Self-hosted options cost $50-200/month typically. Biggest cost factor is number of vectors and dimensions.

Q: What embedding model should I use?

For most text: OpenAI text-embedding-3-small. For multilingual: Cohere embed-v3. For self-hosted: BGE-large or E5-large. The embedding model matters more than the vector database for result quality.

Q: How many vectors can a single instance handle?

pgvector: 1-2M comfortably, 10M with tuning. Qdrant/Weaviate: 10-50M per node. Milvus: billions distributed. Most applications need under 1M vectors.

Q: Do I need to re-embed data when I switch vector databases?

No. Vectors are portable arrays of numbers. You only need to re-embed when switching embedding models (different dimensions or semantic representations).

In this article

What Are Vector Databases?
How They Work
Real-World Use Cases
Database Comparison
How to Choose
Common Pitfalls
Getting Started
FAQ

If you're building any kind of AI application in 2026 — RAG systems, semantic search, recommendation engines, image similarity — you need a vector database. It's become as fundamental to AI applications as relational databases are to web applications.

But the vector database market has exploded from 2 options to 15+ in three years, and every one claims to be the best. We've deployed five different vector databases across client projects at Pillai Infotech. Here's what we've learned about when each one shines and where they struggle.

What Are Vector Databases? (The Simple Explanation)

A regular database stores data as rows and columns: name, age, email, price. You query it with exact matches: "find everyone named John" or "find products under $50."

A vector database stores data as mathematical representations — long lists of numbers called "embeddings" — that capture the meaning of the data. Instead of exact matches, you query by similarity: "find documents similar to this question" or "find products that look like this image."

Here's the key insight: when you convert text, images, or any data into vectors using an AI model (called an embedding model), items that are semantically similar end up as vectors that are close together in mathematical space. "How do I return a product?" and "What's the refund process?" are different text strings but nearly identical vectors.

Traditional DB vs. Vector DB

SQL Query

SELECT * FROM docs WHERE title LIKE '%return policy%'

Finds exact keyword match. Misses "refund process," "send it back," "exchange policy."

Vector Query

Find top 5 nearest neighbors of embed("How do I return something?")

Finds semantically similar content regardless of exact wording.

How Vector Databases Actually Work

The Embedding Pipeline

                // 1. Convert data to vectors

                "How do I return a product?" → [0.023, -0.142, 0.891, ... 1536 dimensions]

                // 2. Store vectors with metadata

                { vector: [0.023, -0.142, ...], source: "faq.md", category: "returns" }

                // 3. Query by similarity

                query_vector = embed("What's your refund policy?")

                results = db.search(query_vector, top_k=5)

                → Returns FAQ about returns (similarity: 0.94)

Indexing: Why Vector Search Is Fast

Brute-force comparing a query against every vector is O(n) — fine for 10,000 vectors, impossible for 100 million. Vector databases use approximate nearest neighbor (ANN) algorithms to make search fast:

HNSW (Hierarchical Navigable Small World): The most popular algorithm. Builds a graph structure where similar vectors are connected. Search navigates the graph to find neighbors. Used by most vector databases. Trade-off: fast queries, high memory usage.
IVF (Inverted File Index): Partitions vectors into clusters and only searches relevant clusters. Lower memory than HNSW, slightly lower recall. Good for very large datasets.
PQ (Product Quantization): Compresses vectors to reduce memory. Combined with IVF for large-scale, memory-constrained deployments.

The practical implication: with HNSW indexing, you can search 10 million vectors in under 10 milliseconds. That's why vector databases feel instant even at scale.

Similarity Metrics

Cosine similarity: Measures the angle between vectors. Range: -1 to 1 (1 = identical). Best for text embeddings. This is our default.
Euclidean distance: Measures straight-line distance. Better when magnitude matters (e.g., when embeddings represent quantities, not just direction).
Dot product: Similar to cosine but considers magnitude. Used when you want higher-magnitude vectors to score higher.

Real-World Use Cases We've Deployed

1. RAG (Retrieval-Augmented Generation)

The most common use case. Embed your documents, store in a vector DB, and retrieve relevant chunks when a user asks a question. Feed those chunks to an LLM to generate an accurate, grounded answer. We covered this in depth in our RAG guide.

2. Semantic Search

Replace keyword search with meaning-based search. A knowledge base with 50,000 articles becomes instantly searchable by intent rather than by exact phrasing. One client saw a 34% improvement in search success rate after switching from Elasticsearch full-text to hybrid vector + keyword search.

3. Recommendation Systems

Embed user behavior and product features into the same vector space. "Users who liked X" and "products similar to Y" become simple nearest-neighbor queries. Real-time personalization without batch processing.

4. Duplicate Detection

Find near-duplicate customer support tickets, bug reports, or documents. Two tickets describing the same issue in different words will have similar vectors. We use this to merge duplicate requests and identify trending issues automatically.

5. Anomaly Detection

In a vector space, anomalies are points far from all clusters. Embed log entries, transactions, or events and flag anything that's distant from the learned normal distribution. No labeled training data needed.

Vector Database Comparison: 2026 Edition

Database	Type	Strengths	Limitations	Cost
pgvector	Extension	Zero new infra, SQL-native, ACID compliance	Slower at >1M vectors, limited indexing options	Free (just Postgres)
Pinecone	Managed	Easiest DX, zero ops, fast globally	Vendor lock-in, expensive at scale	Free tier → $70+/mo
Weaviate	Open source	Hybrid search, rich filtering, GraphQL API	Resource-heavy, steeper learning curve	Free (self-host) → managed
Qdrant	Open source	Rust performance, advanced filtering, small footprint	Younger ecosystem, fewer integrations	Free (self-host) → managed
Milvus	Open source	Billion-scale, distributed, multiple index types	Complex setup, needs Kubernetes for production	Free → Zilliz Cloud
ChromaDB	Open source	Simplest API, great for prototyping, Python-native	Not production-ready for large scale	Free

How to Choose: Our Decision Framework

After deploying five different vector databases, here's our decision tree:

Already using PostgreSQL + under 1M vectors? → Use pgvector. Zero new infrastructure, zero new costs. You can always migrate later if you outgrow it.
Don't want to manage infrastructure? → Use Pinecone. Best managed experience. Accept the cost and vendor lock-in as trade-offs for operational simplicity.
Need hybrid search (vector + keyword)? → Use Weaviate. Built-in BM25 + vector search gives you the best of both worlds. Critical for RAG applications.
Performance-critical with complex filtering? → Use Qdrant. Rust-based, tiny memory footprint, excellent filtered search performance.
Billion-scale dataset? → Use Milvus. Distributed architecture handles massive scale, but requires Kubernetes expertise.
Prototyping / learning? → Use ChromaDB. Simplest API, runs in-process with Python, zero setup.

Our default recommendation: Start with pgvector for production applications. It handles 90% of use cases, requires no new infrastructure, and gives you ACID transactions on vector data alongside your regular application data. Only move to a dedicated vector database when you hit specific limitations (scale, query speed, or feature requirements).

Common Pitfalls We've Encountered

Wrong embedding model for the data type. An embedding model trained on English text won't produce good vectors for code, structured data, or non-English languages. Match the embedding model to your data. OpenAI's text-embedding-3-small is a solid general choice; for code, use specialized code embeddings.
Not storing metadata with vectors. A vector alone is useless — you need to know what it represents. Always store source document, page number, section title, and any other metadata needed for filtering and citation.
Over-indexing. Don't embed everything. If you have 10 million product descriptions but users only search 100,000 active products, index the active ones. Less data = faster queries = lower cost.
Ignoring index tuning. HNSW has parameters (ef_construction, M) that dramatically affect query speed vs. recall. The defaults are conservative. Tune them based on your accuracy requirements and latency budget.
No re-embedding strategy. When embedding models improve (and they do, frequently), you need to re-embed your entire corpus to benefit. Plan for periodic re-embedding, especially after major model upgrades.

Getting Started: A 30-Minute Setup

Here's how to go from zero to a working vector database in under 30 minutes:

Option A: pgvector (if you have PostgreSQL)

                -- Enable extension

                CREATE EXTENSION vector;

                -- Create table with vector column

                CREATE TABLE documents (

                  id SERIAL PRIMARY KEY,

                  content TEXT,

                  embedding vector(1536),

                  metadata JSONB

                );

                -- Create index for fast search

                CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);

                -- Search by similarity

                SELECT content, 1 - (embedding <=> query_vector) AS similarity

                FROM documents ORDER BY embedding <=> query_vector LIMIT 5;

That's it. You now have a vector database running inside your existing PostgreSQL instance.

For help implementing vector databases in your AI applications, from architecture to production deployment, reach out to our team. We'll help you choose the right database and build a scalable vector infrastructure.

Frequently Asked Questions

Can I just use Elasticsearch for vector search?

Elasticsearch added vector search capabilities, and it works for moderate-scale use cases. The advantage is if you already have Elasticsearch for text search, you get vector search without new infrastructure. The disadvantage is performance at scale — dedicated vector databases are 2-5x faster for pure vector search and use less memory. For hybrid search needs, both Elasticsearch and Weaviate are good options.

How much does a vector database cost to run?

pgvector is free (just your PostgreSQL cost). Pinecone starts at a free tier for up to 100K vectors and $70/month for production workloads. Self-hosted options (Weaviate, Qdrant, Milvus) cost whatever your hosting costs — typically $50-200/month for a moderate-sized deployment. The biggest cost factor is the number of vectors and the dimensions per vector.

What embedding model should I use?

For most text applications: OpenAI text-embedding-3-small (1536 dimensions, great quality/cost ratio). For multilingual: Cohere embed-v3. For self-hosted: BGE-large or E5-large. For code: CodeBERT or StarCoder embeddings. The choice of embedding model matters more than the choice of vector database for result quality.

How many vectors can a single instance handle?

pgvector: comfortable to 1-2 million, possible to 10 million with tuning. Qdrant/Weaviate: 10-50 million on a single node. Milvus: billions across a distributed cluster. For most applications, under 1 million vectors is typical, so pgvector handles it fine.

Do I need to re-embed data when I switch vector databases?

No. Vectors are portable — they're just arrays of numbers. You can export vectors from one database and import them into another. What you do need to re-embed for is when you switch embedding models (different dimensions or different semantic representations).

Vector Databases Explained: Powering the AI Revolution

What Are Vector Databases? (The Simple Explanation)

Traditional DB vs. Vector DB

How Vector Databases Actually Work

The Embedding Pipeline

Indexing: Why Vector Search Is Fast

Similarity Metrics

Real-World Use Cases We've Deployed

1. RAG (Retrieval-Augmented Generation)

2. Semantic Search

3. Recommendation Systems

4. Duplicate Detection

5. Anomaly Detection

Vector Database Comparison: 2026 Edition

How to Choose: Our Decision Framework

Common Pitfalls We've Encountered

Getting Started: A 30-Minute Setup

Option A: pgvector (if you have PostgreSQL)

Frequently Asked Questions

Can I just use Elasticsearch for vector search?

How much does a vector database cost to run?

What embedding model should I use?

How many vectors can a single instance handle?

Do I need to re-embed data when I switch vector databases?

Related Articles

RAG Implementation Guide

Building AI Applications

AI Cost Optimization

Pillai Infotech Engineering Team

Need Help Choosing and Deploying a Vector Database?

Related Articles

RAG
RAG Implementation Guide

Vector databases are the backbone of RAG. Here's the full implementation guide.

AI Development
Building AI Applications

Where vector databases fit in the broader AI application architecture.

AI Cost
AI Cost Optimization

Embedding costs and vector storage costs are part of the optimization equation.

Vector Databases Explained: Powering the AI Revolution

What Are Vector Databases? (The Simple Explanation)

Traditional DB vs. Vector DB

How Vector Databases Actually Work

The Embedding Pipeline

Indexing: Why Vector Search Is Fast

Similarity Metrics

Real-World Use Cases We've Deployed

1. RAG (Retrieval-Augmented Generation)

2. Semantic Search

3. Recommendation Systems

4. Duplicate Detection

5. Anomaly Detection

Vector Database Comparison: 2026 Edition

How to Choose: Our Decision Framework

Common Pitfalls We've Encountered

Getting Started: A 30-Minute Setup

Option A: pgvector (if you have PostgreSQL)

Frequently Asked Questions

Can I just use Elasticsearch for vector search?

How much does a vector database cost to run?

What embedding model should I use?

How many vectors can a single instance handle?

Do I need to re-embed data when I switch vector databases?

Related Articles

RAG Implementation Guide

Building AI Applications

AI Cost Optimization

Pillai Infotech Engineering Team

Need Help Choosing and Deploying a Vector Database?

Book a Free Consultation

Your Details

Pick a 30-min Slot

Thank You!