Ideas Engineered for Tomorrow
We Engineer Services & Solutions for Your Business Needs
Home About
Products
Services
Hire
Industries
Consulting
Partners
Articles Careers Contact
Software Development

Performance Engineering: Fix What Matters

We cut an API's P95 latency from 4.2 seconds to 180ms. The fix wasn't a rewrite — it was three missing database indexes and one N+1 query. Performance engineering is about measurement, not guesswork.

November 14, 2025 14 min read

Most performance problems have boring solutions. Not "rewrite it in Rust" or "switch to a faster framework." The real fixes are usually: add an index, fix an N+1 query, add a cache layer, or reduce payload size. We've done performance audits on 30+ applications. In 80% of cases, the top 3 bottlenecks account for 90% of the latency. Find them, fix them, move on.

Measure Before You Optimize

"Premature optimization is the root of all evil" — but so is ignoring performance until users complain. The right approach: set performance budgets, measure continuously, and optimize the measured bottlenecks.

Performance Budgets

Metric Good Acceptable Needs Work
API P50 latency < 100ms 100-500ms > 500ms
API P95 latency < 300ms 300ms-1s > 1s
LCP (Largest Contentful Paint) < 2.5s 2.5-4s > 4s
INP (Interaction to Next Paint) < 200ms 200-500ms > 500ms
CLS (Cumulative Layout Shift) < 0.1 0.1-0.25 > 0.25
Database query time (per request) < 20ms total 20-100ms > 100ms

The Profiling Workflow

  1. Identify the slow endpoint. APM tools (Datadog, New Relic) or simple logging of request times
  2. Break down the latency. Where does time go? Database? External API? Computation? Serialization?
  3. Fix the biggest contributor first. If 80% of latency is database, optimizing your JSON serialization is a waste
  4. Verify the fix. Measure again. Did P95 improve? By how much?

Database Performance: Where 80% of Problems Live

In our experience, most API latency traces back to the database. Here are the patterns we see most often.

The N+1 Query Problem

The most common performance bug in web applications. You fetch a list of 100 items, then run a separate query for each item's related data. That's 101 queries instead of 2.

-- BAD: N+1 (101 queries for 100 orders)
SELECT * FROM orders WHERE user_id = 42;
-- Then for EACH order:
SELECT * FROM order_items WHERE order_id = ?;

-- GOOD: Eager loading (2 queries)
SELECT * FROM orders WHERE user_id = 42;
SELECT * FROM order_items WHERE order_id IN (1, 2, 3, ... 100);

Every ORM has eager loading support (Eloquent's with(), SQLAlchemy's joinedload(), Prisma's include). Use it.

Missing Indexes

If a query filters or joins on a column without an index, the database scans every row. For a table with 1 million rows, that's the difference between 2ms and 2 seconds.

-- Find slow queries (PostgreSQL)
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Check if a query uses indexes
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;
-- Look for "Seq Scan" (bad) vs "Index Scan" (good)

Query Optimization Checklist

  • Index columns used in WHERE, JOIN, and ORDER BY. This is the single highest-impact optimization
  • Use composite indexes for queries that filter on multiple columns: CREATE INDEX idx_orders_user_status ON orders(user_id, status)
  • Avoid SELECT *. Fetch only the columns you need. Less data to transfer, less memory, faster
  • Paginate large result sets. Use cursor-based pagination for consistent performance. Offset pagination degrades as the offset grows
  • Use connection pooling. PgBouncer for PostgreSQL, ProxySQL for MySQL. Connection creation is expensive (50-100ms each)

Caching: The Right Way

Caching is the fastest way to improve performance — and the fastest way to introduce bugs if done wrong. The hard part isn't adding a cache; it's invalidating it correctly.

Cache Layer What to Cache TTL Invalidation Strategy
Browser cache Static assets (JS, CSS, images) 1 year (with content hash in filename) New filename on build = automatic invalidation
CDN (CloudFlare, CloudFront) API responses that don't change per user 5 min to 1 hour Purge on deploy or data change
Application cache (Redis) Database query results, computed values, session data 30s to 15 min (depends on freshness needs) Write-through: update cache when data changes. Or TTL-based: accept brief staleness
Database query cache Expensive aggregation queries 1-5 min Clear on related table writes

Cache Invalidation Patterns

  • TTL-based: Simplest. Set an expiry time. Accept that data might be stale for up to TTL. Good for: product listings, leaderboards, dashboards
  • Write-through: Update the cache whenever you update the database. More complex but always fresh. Good for: user profiles, settings
  • Event-driven: Publish an event on data change, subscriber invalidates the cache. Good for: distributed systems where multiple services cache the same data

Frontend Performance

Frontend performance directly impacts SEO and conversion. Google's Core Web Vitals (LCP, INP, CLS) are ranking factors. Every 100ms of added load time reduces conversion by 1-2%.

Quick Wins (Biggest Impact, Least Effort)

  • Image optimization. Use WebP format, lazy-load below-the-fold images, serve responsive sizes. This alone can cut page weight by 50%+
  • Bundle splitting. Don't load the entire app's JavaScript on the first page. Split by route
  • Font loading. Use font-display: swap. Subset fonts to only the characters you use. Or use system fonts — they load instantly
  • Remove unused JavaScript. Run a bundle analyzer. Most projects ship 30-50% unused JS. Tree shaking and selective imports fix this

Load Testing: Know Your Limits

Load testing answers two questions: "How many concurrent users can we handle?" and "Where does it break first?"

Load Testing with k6

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
    stages: [
        { duration: '2m', target: 100 },  // Ramp up to 100 users
        { duration: '5m', target: 100 },  // Hold at 100
        { duration: '2m', target: 300 },  // Spike to 300
        { duration: '5m', target: 300 },  // Hold at 300
        { duration: '2m', target: 0 },    // Ramp down
    ],
    thresholds: {
        http_req_duration: ['p(95)<500'],  // 95% of requests under 500ms
        http_req_failed: ['rate<0.01'],    // Less than 1% errors
    },
};

export default function () {
    const res = http.get('https://api.yourapp.com/products');
    check(res, {
        'status is 200': (r) => r.status === 200,
        'response time < 500ms': (r) => r.timings.duration < 500,
    });
    sleep(1);
}

Performance Monitoring in Production

Tool What It Monitors Cost
Datadog APM Request traces, database queries, external calls $31/host/month
New Relic Similar to Datadog. Good free tier (100GB/month) Free tier available
Grafana + Prometheus Custom metrics, dashboards. Self-hosted Free (self-hosted)
Google PageSpeed Insights Core Web Vitals from real user data Free

For debugging production performance issues, having these tools in place before the incident makes the difference between a 5-minute fix and a 2-hour investigation.

Frequently Asked Questions

When should we start thinking about performance?

Set performance budgets from day one (API latency, page load time). Don't optimize prematurely, but measure continuously. When a metric exceeds the budget, that's when you optimize. For MVPs, ship first and optimize when you have real users generating real load.

Is Redis necessary for every application?

No. If your database handles the load well (which it will for most apps under 10,000 daily users), adding Redis adds operational complexity without clear benefit. Add caching when you have measured evidence that database latency is the bottleneck, not preemptively.

How do we handle performance in a microservices architecture?

Distributed tracing is essential — use OpenTelemetry to trace requests across services. Set latency budgets per service (e.g., "this service adds max 50ms to the request"). Monitor inter-service call patterns for cascading latency.

Pillai Infotech Engineering Team

We've done performance audits on 30+ applications. Our biggest win: reducing an e-commerce checkout API from 4.2s to 180ms, which increased conversion by 23%.

Is Your Application Slow?

We run performance audits that identify the top bottlenecks and fix them. Typical results: 50-90% latency reduction.

Get a Performance Audit Our Services