In This Guide
Search is the feature users notice most when it's bad and least when it's good. A search that returns irrelevant results, can't handle typos, or takes 2 seconds to respond will drive users away faster than almost any other UX problem. Elasticsearch solves this — but it requires understanding how full-text search actually works.
1. Why Elasticsearch (and When Not to Use It)
| Search Need | Best Solution | Why |
|---|---|---|
| Simple keyword filter on < 100K rows | PostgreSQL full-text search | No extra infrastructure |
| Typo-tolerant search with ranking | Elasticsearch or Meilisearch | Purpose-built for relevance |
| E-commerce product search | Elasticsearch (or Algolia) | Facets, filters, boosting, analytics |
| Log search and analytics | Elasticsearch (ELK stack) | Designed for log ingestion and analysis |
| Semantic / vector search | Elasticsearch 8+ or Weaviate/Pinecone | kNN vector search built-in |
| Small site search (< 10K docs) | Meilisearch or Typesense | Simpler, faster to set up |
2. Core Concepts — Indices, Mappings, Analyzers
How Text Search Works (Inverted Index)
Document 1: "The quick brown fox jumps"
Document 2: "Quick brown dogs leap over fences"
Analyzer pipeline: lowercase → remove stopwords → stem
Inverted Index:
"quick" → [Doc 1, Doc 2]
"brown" → [Doc 1, Doc 2]
"fox" → [Doc 1]
"jump" → [Doc 1] ← "jumps" stemmed to "jump"
"dog" → [Doc 2]
"leap" → [Doc 2]
"fenc" → [Doc 2] ← "fences" stemmed to "fenc"
Search for "quick fox" → finds Doc 1 (matches both terms, higher score)
→ also finds Doc 2 (matches "quick")
Index Mapping — Define Your Schema
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "standard",
"fields": {
"keyword": { "type": "keyword" }, // For exact match & sorting
"autocomplete": { // For search-as-you-type
"type": "text",
"analyzer": "autocomplete_analyzer"
}
}
},
"description": { "type": "text", "analyzer": "english" },
"category": { "type": "keyword" }, // Exact match only (facets)
"price": { "type": "float" },
"rating": { "type": "float" },
"in_stock": { "type": "boolean" },
"created_at": { "type": "date" },
"location": { "type": "geo_point" } // For geo search
}
},
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "autocomplete_filter"]
}
},
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 15
}
}
}
}
}
3. Indexing Strategies
Bulk Indexing — The Right Way
// Node.js — bulk index 10,000 products
const { Client } = require('@elastic/elasticsearch');
const client = new Client({ node: 'http://localhost:9200' });
// WRONG: Individual index calls (slow — 1 HTTP request per document)
for (const product of products) {
await client.index({ index: 'products', body: product }); // 💥 N requests
}
// RIGHT: Bulk API (fast — 1 HTTP request per batch)
const body = products.flatMap(product => [
{ index: { _index: 'products', _id: product.id } },
product
]);
const { body: bulkResponse } = await client.bulk({
refresh: true, // Make documents searchable immediately
body
});
if (bulkResponse.errors) {
const erroredDocs = bulkResponse.items.filter(item => item.index.error);
console.error('Failed documents:', erroredDocs);
}
// Optimal batch size: 5-15 MB per bulk request (typically 1,000-5,000 docs)
| Sync Strategy | How It Works | Latency | Best For |
|---|---|---|---|
| Sync on write | App writes to DB + ES simultaneously | < 1 second | Simple apps, low write volume |
| Event-driven | DB write → event → consumer indexes ES | 1-5 seconds | Microservices, decoupled systems |
| CDC (Debezium) | DB transaction log → Kafka → ES connector | 1-10 seconds | No app changes, reliable sync |
| Periodic reindex | Cron job reads DB, bulk indexes | Minutes to hours | Infrequent changes, full consistency |
4. Query DSL — Search That Actually Works
E-Commerce Product Search — Complete Query
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "wireless headphones",
"fields": ["name^3", "description", "category^2"],
"type": "best_fields",
"fuzziness": "AUTO",
"prefix_length": 2
}
}
],
"filter": [
{ "term": { "in_stock": true } },
{ "range": { "price": { "gte": 50, "lte": 300 } } },
{ "terms": { "category": ["electronics", "audio"] } }
],
"should": [
{ "range": { "rating": { "gte": 4.0, "boost": 2 } } },
{ "term": { "featured": { "value": true, "boost": 5 } } }
]
}
},
"highlight": {
"fields": {
"name": {},
"description": { "fragment_size": 150 }
}
},
"aggs": {
"categories": { "terms": { "field": "category", "size": 20 } },
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 50 },
{ "from": 50, "to": 100 },
{ "from": 100, "to": 200 },
{ "from": 200 }
]
}
},
"avg_rating": { "avg": { "field": "rating" } }
},
"size": 20,
"from": 0
}
Key concepts in the query above:
- must — the search query itself (affects relevance score)
- filter — hard constraints that don't affect scoring (cached for performance)
- should — soft boosting factors (better rating = higher in results)
- fuzziness: AUTO — handles typos (1 edit for 3-5 char words, 2 for 6+)
- name^3 — name matches are 3x more important than description
- aggs — faceted search (category counts, price ranges for sidebar filters)
5. Relevance Tuning
Default Elasticsearch relevance (BM25) is good but not great. Here's how to make search results feel right.
| Technique | How | When to Use |
|---|---|---|
| Field boosting | "name^3" — weight title matches higher | Always — titles are more relevant than body text |
| Function score | Boost by popularity, recency, or rating | When freshness or popularity matters |
| Synonyms | Custom synonym filter in analyzer | Domain-specific terms ("laptop" = "notebook") |
| Pinned queries | Force specific docs to top for queries | Promoted products, editorial picks |
| Decay functions | Lower score for older/farther results | News, events, location-based search |
Function Score — Boost by Popularity + Recency
GET /articles/_search
{
"query": {
"function_score": {
"query": { "match": { "content": "database scaling" } },
"functions": [
{
"field_value_factor": {
"field": "view_count",
"modifier": "log1p",
"factor": 0.5
}
},
{
"gauss": {
"published_at": {
"origin": "now",
"scale": "30d",
"decay": 0.5
}
}
}
],
"score_mode": "multiply",
"boost_mode": "multiply"
}
}
}
6. Common Patterns — Autocomplete, Facets, Geo
Search-as-You-Type (Autocomplete)
// Index with search_as_you_type field (ES 7.2+)
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "search_as_you_type" // Creates name, name._2gram, name._3gram
}
}
}
}
// Query — matches partial words as user types
GET /products/_search
{
"query": {
"multi_match": {
"query": "wire head",
"type": "bool_prefix",
"fields": ["name", "name._2gram", "name._3gram"]
}
}
}
// Matches: "Wireless Headphones", "Wired Headset", etc.
Geo Search — Find Nearby
GET /stores/_search
{
"query": {
"bool": {
"must": { "match": { "type": "restaurant" } },
"filter": {
"geo_distance": {
"distance": "5km",
"location": { "lat": 19.076, "lon": 72.877 } // Mumbai
}
}
}
},
"sort": [
{
"_geo_distance": {
"location": { "lat": 19.076, "lon": 72.877 },
"order": "asc",
"unit": "km"
}
}
]
}
7. Scaling and Operations
| Scale | Cluster Size | Key Settings | Monthly Cost |
|---|---|---|---|
| < 1M docs | 1 node (8GB RAM) | 1 primary, 0 replicas | $50-100 |
| 1-10M docs | 3 nodes (16GB each) | 5 shards, 1 replica | $300-600 |
| 10-100M docs | 5-10 nodes (32GB each) | Time-based indices, ILM | $1,000-3,000 |
| > 100M docs | 10+ nodes, dedicated masters | Hot-warm-cold architecture | $3,000+ |
Frequently Asked Questions
Elasticsearch vs Meilisearch vs Typesense?
Elasticsearch for complex search requirements (facets, aggregations, geo, analytics) and large-scale deployments. Meilisearch for simple, fast search with minimal configuration — great for small to medium apps. Typesense for type-ahead search with out-of-the-box relevance. Pick based on complexity needs, not hype.
Should I use Elasticsearch or OpenSearch?
OpenSearch is the AWS-backed fork of Elasticsearch 7.10. Both are capable. Use OpenSearch if you're on AWS (native Amazon OpenSearch Service). Use Elasticsearch if you want the latest features (vector search, ESQL) or use Elastic Cloud. The APIs are 95% compatible.
How do I keep Elasticsearch in sync with my database?
For most apps: write to both DB and ES on every change (sync-on-write). For larger systems: use CDC with Debezium — it captures every database change and streams it to ES via Kafka. This is more reliable than app-level dual writes because it catches changes from any source (migrations, admin tools, other services).
Can Elasticsearch replace my database?
No. Elasticsearch is not a primary data store — it lacks ACID transactions, has eventual consistency, and can lose data during cluster issues. Always keep your source of truth in a proper database (PostgreSQL, MySQL, etc.) and use ES as a secondary search index.
How much RAM does Elasticsearch need?
Rule of thumb: 1GB heap per 20GB of index data. Give ES 50% of available RAM as JVM heap (max 31GB), and leave the other 50% for the filesystem cache (critical for search performance). A node with 16GB RAM should get 8GB heap and can comfortably handle ~160GB of index data.
Pillai Infotech LLP
We implement search solutions — from simple site search to complex e-commerce search with facets, autocomplete, and relevance tuning. Let's build your search.