Elasticsearch Search Implementation Guide

Q: Elasticsearch vs Meilisearch vs Typesense?

Elasticsearch for complex requirements and scale. Meilisearch for simple, fast search. Typesense for type-ahead with great defaults.

Q: Should I use Elasticsearch or OpenSearch?

OpenSearch if on AWS. Elasticsearch for latest features or Elastic Cloud. APIs are 95% compatible.

Q: How do I keep Elasticsearch in sync with my database?

Sync-on-write for simple apps. CDC with Debezium for larger systems — captures every DB change reliably.

Q: Can Elasticsearch replace my database?

No. ES lacks ACID transactions and has eventual consistency. Keep your source of truth in a proper database.

Q: How much RAM does Elasticsearch need?

1GB heap per 20GB index data. Set heap to 50% of RAM (max 31GB). Leave the rest for filesystem cache.

Elasticsearch: Building Powerful Search for Your Application

Your users expect Google-quality search. SQL LIKE '%query%' won't cut it. Here's how to build fast, relevant, typo-tolerant search with Elasticsearch — from first index to production deployment.

🔍 Database & Data February 5, 2026 14 min read

In This Guide

1. Why Elasticsearch (and When Not to Use It)
2. Core Concepts — Indices, Mappings, Analyzers
3. Indexing Strategies
4. Query DSL — Search That Actually Works
5. Relevance Tuning
6. Common Patterns — Autocomplete, Facets, Geo
7. Scaling and Operations
8. Frequently Asked Questions

Search is the feature users notice most when it's bad and least when it's good. A search that returns irrelevant results, can't handle typos, or takes 2 seconds to respond will drive users away faster than almost any other UX problem. Elasticsearch solves this — but it requires understanding how full-text search actually works.

1. Why Elasticsearch (and When Not to Use It)

Search Need	Best Solution	Why
Simple keyword filter on < 100K rows	PostgreSQL full-text search	No extra infrastructure
Typo-tolerant search with ranking	Elasticsearch or Meilisearch	Purpose-built for relevance
E-commerce product search	Elasticsearch (or Algolia)	Facets, filters, boosting, analytics
Log search and analytics	Elasticsearch (ELK stack)	Designed for log ingestion and analysis
Semantic / vector search	Elasticsearch 8+ or Weaviate/Pinecone	kNN vector search built-in
Small site search (< 10K docs)	Meilisearch or Typesense	Simpler, faster to set up

2. Core Concepts — Indices, Mappings, Analyzers

How Text Search Works (Inverted Index)

Document 1: "The quick brown fox jumps"
Document 2: "Quick brown dogs leap over fences"

Analyzer pipeline: lowercase → remove stopwords → stem

Inverted Index:
    "quick"  → [Doc 1, Doc 2]
    "brown"  → [Doc 1, Doc 2]
    "fox"    → [Doc 1]
    "jump"   → [Doc 1]         ← "jumps" stemmed to "jump"
    "dog"    → [Doc 2]
    "leap"   → [Doc 2]
    "fenc"   → [Doc 2]         ← "fences" stemmed to "fenc"

Search for "quick fox" → finds Doc 1 (matches both terms, higher score)
                       → also finds Doc 2 (matches "quick")

Index Mapping — Define Your Schema

PUT /products
{
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "standard",
                "fields": {
                    "keyword": { "type": "keyword" },     // For exact match & sorting
                    "autocomplete": {                       // For search-as-you-type
                        "type": "text",
                        "analyzer": "autocomplete_analyzer"
                    }
                }
            },
            "description": { "type": "text", "analyzer": "english" },
            "category": { "type": "keyword" },             // Exact match only (facets)
            "price": { "type": "float" },
            "rating": { "type": "float" },
            "in_stock": { "type": "boolean" },
            "created_at": { "type": "date" },
            "location": { "type": "geo_point" }            // For geo search
        }
    },
    "settings": {
        "analysis": {
            "analyzer": {
                "autocomplete_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "autocomplete_filter"]
                }
            },
            "filter": {
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": 2,
                    "max_gram": 15
                }
            }
        }
    }
}

3. Indexing Strategies

Bulk Indexing — The Right Way

// Node.js — bulk index 10,000 products
const { Client } = require('@elastic/elasticsearch');
const client = new Client({ node: 'http://localhost:9200' });

// WRONG: Individual index calls (slow — 1 HTTP request per document)
for (const product of products) {
    await client.index({ index: 'products', body: product });  // 💥 N requests
}

// RIGHT: Bulk API (fast — 1 HTTP request per batch)
const body = products.flatMap(product => [
    { index: { _index: 'products', _id: product.id } },
    product
]);

const { body: bulkResponse } = await client.bulk({
    refresh: true,  // Make documents searchable immediately
    body
});

if (bulkResponse.errors) {
    const erroredDocs = bulkResponse.items.filter(item => item.index.error);
    console.error('Failed documents:', erroredDocs);
}

// Optimal batch size: 5-15 MB per bulk request (typically 1,000-5,000 docs)

Sync Strategy	How It Works	Latency	Best For
Sync on write	App writes to DB + ES simultaneously	< 1 second	Simple apps, low write volume
Event-driven	DB write → event → consumer indexes ES	1-5 seconds	Microservices, decoupled systems
CDC (Debezium)	DB transaction log → Kafka → ES connector	1-10 seconds	No app changes, reliable sync
Periodic reindex	Cron job reads DB, bulk indexes	Minutes to hours	Infrequent changes, full consistency

4. Query DSL — Search That Actually Works

E-Commerce Product Search — Complete Query

GET /products/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "multi_match": {
                        "query": "wireless headphones",
                        "fields": ["name^3", "description", "category^2"],
                        "type": "best_fields",
                        "fuzziness": "AUTO",
                        "prefix_length": 2
                    }
                }
            ],
            "filter": [
                { "term": { "in_stock": true } },
                { "range": { "price": { "gte": 50, "lte": 300 } } },
                { "terms": { "category": ["electronics", "audio"] } }
            ],
            "should": [
                { "range": { "rating": { "gte": 4.0, "boost": 2 } } },
                { "term": { "featured": { "value": true, "boost": 5 } } }
            ]
        }
    },
    "highlight": {
        "fields": {
            "name": {},
            "description": { "fragment_size": 150 }
        }
    },
    "aggs": {
        "categories": { "terms": { "field": "category", "size": 20 } },
        "price_ranges": {
            "range": {
                "field": "price",
                "ranges": [
                    { "to": 50 },
                    { "from": 50, "to": 100 },
                    { "from": 100, "to": 200 },
                    { "from": 200 }
                ]
            }
        },
        "avg_rating": { "avg": { "field": "rating" } }
    },
    "size": 20,
    "from": 0
}

Key concepts in the query above:

must — the search query itself (affects relevance score)
filter — hard constraints that don't affect scoring (cached for performance)
should — soft boosting factors (better rating = higher in results)
fuzziness: AUTO — handles typos (1 edit for 3-5 char words, 2 for 6+)
name^3 — name matches are 3x more important than description
aggs — faceted search (category counts, price ranges for sidebar filters)

5. Relevance Tuning

Default Elasticsearch relevance (BM25) is good but not great. Here's how to make search results feel right.

Technique	How	When to Use
Field boosting	"name^3" — weight title matches higher	Always — titles are more relevant than body text
Function score	Boost by popularity, recency, or rating	When freshness or popularity matters
Synonyms	Custom synonym filter in analyzer	Domain-specific terms ("laptop" = "notebook")
Pinned queries	Force specific docs to top for queries	Promoted products, editorial picks
Decay functions	Lower score for older/farther results	News, events, location-based search

Function Score — Boost by Popularity + Recency

GET /articles/_search
{
    "query": {
        "function_score": {
            "query": { "match": { "content": "database scaling" } },
            "functions": [
                {
                    "field_value_factor": {
                        "field": "view_count",
                        "modifier": "log1p",
                        "factor": 0.5
                    }
                },
                {
                    "gauss": {
                        "published_at": {
                            "origin": "now",
                            "scale": "30d",
                            "decay": 0.5
                        }
                    }
                }
            ],
            "score_mode": "multiply",
            "boost_mode": "multiply"
        }
    }
}

6. Common Patterns — Autocomplete, Facets, Geo

Search-as-You-Type (Autocomplete)

// Index with search_as_you_type field (ES 7.2+)
PUT /products
{
    "mappings": {
        "properties": {
            "name": {
                "type": "search_as_you_type"  // Creates name, name._2gram, name._3gram
            }
        }
    }
}

// Query — matches partial words as user types
GET /products/_search
{
    "query": {
        "multi_match": {
            "query": "wire head",
            "type": "bool_prefix",
            "fields": ["name", "name._2gram", "name._3gram"]
        }
    }
}
// Matches: "Wireless Headphones", "Wired Headset", etc.

Geo Search — Find Nearby

GET /stores/_search
{
    "query": {
        "bool": {
            "must": { "match": { "type": "restaurant" } },
            "filter": {
                "geo_distance": {
                    "distance": "5km",
                    "location": { "lat": 19.076, "lon": 72.877 }  // Mumbai
                }
            }
        }
    },
    "sort": [
        {
            "_geo_distance": {
                "location": { "lat": 19.076, "lon": 72.877 },
                "order": "asc",
                "unit": "km"
            }
        }
    ]
}

7. Scaling and Operations

Scale	Cluster Size	Key Settings	Monthly Cost
< 1M docs	1 node (8GB RAM)	1 primary, 0 replicas	$50-100
1-10M docs	3 nodes (16GB each)	5 shards, 1 replica	$300-600
10-100M docs	5-10 nodes (32GB each)	Time-based indices, ILM	$1,000-3,000
> 100M docs	10+ nodes, dedicated masters	Hot-warm-cold architecture	$3,000+

What We've Learned Running Elasticsearch: The #1 operational issue is JVM heap pressure. Set heap to 50% of RAM but never exceed 31GB (compressed oops boundary). Monitor cluster health daily — yellow means replicas aren't allocated (missing nodes), red means primary shards are missing (data loss risk). Use Index Lifecycle Management (ILM) to automatically roll over and delete old indices.

Frequently Asked Questions

Elasticsearch vs Meilisearch vs Typesense?

Elasticsearch for complex search requirements (facets, aggregations, geo, analytics) and large-scale deployments. Meilisearch for simple, fast search with minimal configuration — great for small to medium apps. Typesense for type-ahead search with out-of-the-box relevance. Pick based on complexity needs, not hype.

Should I use Elasticsearch or OpenSearch?

OpenSearch is the AWS-backed fork of Elasticsearch 7.10. Both are capable. Use OpenSearch if you're on AWS (native Amazon OpenSearch Service). Use Elasticsearch if you want the latest features (vector search, ESQL) or use Elastic Cloud. The APIs are 95% compatible.

How do I keep Elasticsearch in sync with my database?

For most apps: write to both DB and ES on every change (sync-on-write). For larger systems: use CDC with Debezium — it captures every database change and streams it to ES via Kafka. This is more reliable than app-level dual writes because it catches changes from any source (migrations, admin tools, other services).

Can Elasticsearch replace my database?

No. Elasticsearch is not a primary data store — it lacks ACID transactions, has eventual consistency, and can lose data during cluster issues. Always keep your source of truth in a proper database (PostgreSQL, MySQL, etc.) and use ES as a secondary search index.

How much RAM does Elasticsearch need?

Rule of thumb: 1GB heap per 20GB of index data. Give ES 50% of available RAM as JVM heap (max 31GB), and leave the other 50% for the filesystem cache (critical for search performance). A node with 16GB RAM should get 8GB heap and can comfortably handle ~160GB of index data.

🔍

Pillai Infotech LLP

We implement search solutions — from simple site search to complex e-commerce search with facets, autocomplete, and relevance tuning. Let's build your search.

NoSQL Databases Guide: MongoDB, Redis, Cassandra, and DynamoDB → Redis Caching Patterns: Beyond Simple Key-Value → Database Scaling Strategies: Sharding, Replication, and Caching →