Ideas Engineered for Tomorrow
We Engineer Services & Solutions for Your Business Needs
Home About
Products
Services
Hire
Industries
Consulting
Partners
Articles Careers Contact
Software Development

Testing Strategies That Actually Work in Production

The testing pyramid is dead. Here's what replaced it — and why the "right" testing strategy depends on what you're building, not what a textbook says.

November 22, 2025 14 min read

We've built testing suites that caught zero real bugs and testing suites that prevented six-figure outages. The difference wasn't the number of tests — it was what we tested and how. After shipping software for clients across fintech, healthcare, and SaaS, here's our honest take on testing strategies that deliver real confidence.

Rethinking the Testing Pyramid

The classic testing pyramid — lots of unit tests at the base, fewer integration tests in the middle, minimal E2E tests at the top — made sense when unit tests were cheap and E2E tests required manual Selenium setups that broke every week. That world doesn't exist anymore.

Modern tools have shifted the economics:

Test Type 2015 Reality 2025 Reality
Unit tests Fast, cheap, reliable Still fast, but many test implementation details that change constantly
Integration tests Slow, needed real databases, flaky Testcontainers makes them fast and reliable. Docker spins up real Postgres in 2 seconds
E2E tests Selenium: slow, brittle, expensive to maintain Playwright/Cypress: fast, stable, parallel execution, visual regression for free

We don't follow the pyramid. We follow what Kent C. Dodds calls the "testing trophy" — heavily weighted towards integration tests, with strategic unit tests and minimal but targeted E2E tests. This matches our experience: integration tests catch the most real bugs per hour of engineering time invested.

Unit Tests: Where They Actually Help

Unit tests get a lot of reverence. They also get a lot of waste. We've seen codebases with 2,000 unit tests that mostly test that getFullName() concatenates first and last name. Those tests pass when the system is broken and break when the system is fine (because someone renamed a method).

Test Business Logic, Not Implementation

// BAD: Tests implementation details
test('should call UserRepository.save', () => {
  const mockRepo = { save: jest.fn() };
  const service = new UserService(mockRepo);
  service.createUser({ name: 'Alice' });
  expect(mockRepo.save).toHaveBeenCalledWith({ name: 'Alice' });
});
// This test breaks if you refactor the internal save mechanism
// but doesn't catch if validation is wrong

// GOOD: Tests behaviour
test('rejects user with duplicate email', async () => {
  await createUser({ email: 'alice@test.com' });
  const result = await createUser({ email: 'alice@test.com' });
  expect(result.error).toBe('EMAIL_ALREADY_EXISTS');
});
// Tests what the system DOES, not HOW it does it

When Unit Tests Shine

  • Pure functions with complex logic — pricing calculations, date parsing, data transformations, validation rules. These have clear inputs and outputs with no side effects
  • Edge cases — boundary conditions (empty arrays, null values, max integers) that are hard to hit through integration tests
  • Algorithms — sorting, searching, rate limiting, retry logic. Test the algorithm in isolation, test the integration separately
  • State machines — order lifecycle (pending → paid → shipped → delivered), user status transitions. Each valid and invalid transition is a test case

When Unit Tests Are Waste

  • Testing that a controller calls a service (test the HTTP endpoint instead)
  • Testing that a service calls a repository (test with a real database instead)
  • Testing getters/setters/constructors
  • Testing third-party library behaviour (they have their own tests)

Integration Tests: The Highest ROI

Integration tests verify that components work together. They're where we invest most of our testing effort because they catch the bugs that actually reach production — misconfigured database queries, broken API contracts, incorrect middleware chains.

The Testcontainers Revolution

The reason integration tests used to be painful was shared test databases. One developer's test data interfered with another's. Testcontainers changed this — each test run spins up a fresh Docker container with a real database. Your tests run against real PostgreSQL, real Redis, real Elasticsearch.

// Integration test with Testcontainers (Node.js)
import { PostgreSqlContainer } from '@testcontainers/postgresql';

let container;
let db;

beforeAll(async () => {
  container = await new PostgreSqlContainer()
    .withDatabase('test_db')
    .start();

  db = await createPool({
    connectionString: container.getConnectionUri()
  });

  await runMigrations(db);  // Same migrations as production
});

afterAll(async () => {
  await container.stop();
});

test('creates order with correct total', async () => {
  // Seed data
  const product = await db.query(
    'INSERT INTO products (name, price) VALUES ($1, $2) RETURNING id',
    ['Widget', 29.99]
  );

  // Test the actual API endpoint
  const response = await request(app)
    .post('/api/orders')
    .send({ items: [{ productId: product.rows[0].id, qty: 3 }] })
    .expect(201);

  expect(response.body.total).toBe(89.97);

  // Verify database state
  const order = await db.query('SELECT * FROM orders WHERE id = $1', [response.body.id]);
  expect(order.rows[0].status).toBe('pending');
});

What to Integration Test

Test This Why Common Bug It Catches
API endpoints Tests routing, validation, auth, serialization, DB queries in one shot Missing auth check, wrong status code, incorrect SQL join
Database queries ORM-generated SQL often doesn't match expectations N+1 queries, incorrect WHERE clauses, missing indexes
Message consumers Deserialization, idempotency, error handling Message format changes, duplicate processing, poison messages
External API clients Serialization, error handling, retry logic Changed API response format, timeout handling, rate limit handling
Auth flows Most security bugs are integration bugs Token expiry not checked, role escalation, missing permission checks

E2E Tests: Less Is More

E2E tests simulate real user flows through the full application. They're expensive to write, slow to run, and the first to break. The trick is writing just enough to cover your critical paths — not trying to test everything through the browser.

The Critical Path Strategy

We identify the 5-10 user journeys that matter most (the ones where failure means revenue loss or user trust damage) and write E2E tests only for those.

For a typical SaaS application, our E2E suite covers:

  1. Sign up → onboarding → first action (the conversion funnel)
  2. Login → core workflow → expected outcome (the happy path)
  3. Payment flow → confirmation → receipt (the money path)
  4. Error state → recovery → success (the resilience path)

That's it. 15-20 E2E tests total. Everything else is covered by integration tests.

Playwright: Our Current Choice

// Playwright E2E test — critical payment flow
import { test, expect } from '@playwright/test';

test('complete purchase flow', async ({ page }) => {
  // Login
  await page.goto('/login');
  await page.fill('[data-testid="email"]', 'test@example.com');
  await page.fill('[data-testid="password"]', 'testpass123');
  await page.click('[data-testid="login-button"]');

  // Add to cart
  await page.goto('/products/widget-pro');
  await page.click('[data-testid="add-to-cart"]');
  await expect(page.locator('[data-testid="cart-count"]')).toHaveText('1');

  // Checkout
  await page.click('[data-testid="checkout"]');
  await page.fill('[data-testid="card-number"]', '4242424242424242');
  await page.fill('[data-testid="expiry"]', '12/28');
  await page.fill('[data-testid="cvc"]', '123');
  await page.click('[data-testid="pay-button"]');

  // Verify
  await expect(page.locator('[data-testid="order-confirmation"]'))
    .toBeVisible({ timeout: 10000 });
  await expect(page.locator('[data-testid="order-total"]'))
    .toContainText('₹2,999');
});

E2E Anti-Patterns We've Learned the Hard Way

  • Don't test through the UI what you can test through the API. If your integration tests cover the business logic, the E2E test only needs to verify the UI renders correctly and user interactions trigger the right actions
  • Don't share state between tests. Each E2E test should set up its own data. Shared state causes mysterious failures when tests run in a different order
  • Don't sleep. await page.waitForTimeout(3000) is a reliability timebomb. Wait for specific elements or network requests instead

Contract Testing for Microservices

When you have 10 services communicating via APIs, how do you know a change in Service A won't break Service B? Integration tests help, but running all services together in CI is slow and fragile. Contract testing is the answer.

How It Works

The consumer (Service B) writes a "contract" — a specification of what it expects from Service A's API. Service A's CI verifies that it still satisfies all consumer contracts. If someone changes Service A's response format, the contract test fails before it breaks Service B in production.

Tool Language Support Our Take
Pact JS, Java, Python, Go, .NET, Ruby The standard for consumer-driven contract testing. Pact Broker centralizes contracts. We use this for most projects
Spring Cloud Contract Java/Kotlin (Spring ecosystem) Great if you're all-in on Spring. Provider-driven instead of consumer-driven
Specmatic Any (uses OpenAPI specs) Uses your existing OpenAPI spec as the contract. Lower friction to adopt if you already have API docs

Performance Testing That Matters

Most performance testing we see is either "run a load test once before launch and never again" or "we have no performance testing." Both are problems.

Three Types of Performance Tests

  • Baseline benchmarks (run in CI) — Simple tests that verify key endpoints respond within acceptable latency. If your login endpoint usually responds in 150ms and suddenly takes 800ms after a code change, the CI should catch that. We use k6 for this — it's scriptable, fast, and integrates with CI pipelines
  • Load tests (run weekly/before releases) — Simulate expected peak traffic. For an Indian e-commerce client, we simulate Diwali sale traffic (5x normal) and verify the system stays responsive. Look for: response time degradation, error rate increases, database connection pool exhaustion, memory leaks
  • Stress tests (run quarterly) — Push past expected limits to find the breaking point. Where does the system fail first? Is it the database? The API gateway? A specific microservice? Knowing your limits lets you plan capacity
// k6 performance test — baseline benchmark in CI
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95th percentile under 500ms
    http_req_failed: ['rate<0.01'],     // Less than 1% error rate
  },
  stages: [
    { duration: '30s', target: 50 },   // Ramp to 50 users
    { duration: '1m', target: 50 },    // Hold at 50 users
    { duration: '10s', target: 0 },    // Ramp down
  ],
};

export default function () {
  const res = http.get('https://api.example.com/products');
  check(res, {
    'status 200': (r) => r.status === 200,
    'response time OK': (r) => r.timings.duration < 500,
  });
  sleep(1);
}

Building Your Testing Strategy

Here's how we approach testing for different project types. The strategy isn't universal — it depends on what you're building.

Project Type Unit Tests Integration Tests E2E Tests Contract Tests
Monolith API Business logic, validators Every endpoint, DB queries Critical UI flows only Not needed (one service)
Microservices Domain logic per service Each service's APIs + DB 5-10 user journeys All service boundaries
Frontend SPA Utility functions, hooks Component rendering, API mocks Full user flows + visual regression API expectations
Data pipeline Transforms, parsers Pipeline stages with sample data End-to-end data flow Schema validation
Mobile app Business logic, state management API client, local storage Detox/Appium for critical flows Backend API contracts

Our Testing Checklist for New Projects

  1. CI runs tests on every PR. Non-negotiable. If tests don't run automatically, they won't run at all
  2. Tests complete in under 10 minutes. Longer than that and developers stop waiting. Parallelize, use Testcontainers, split into fast/slow suites
  3. No flaky tests. A flaky test that fails 5% of the time wastes more engineering time than having no test at all. Fix it or delete it
  4. Test data is self-contained. Every test creates its own data and cleans up after itself. No shared test databases, no seed files that drift
  5. Coverage targets are sensible. We target 80% line coverage on business logic, 0% on boilerplate. 100% coverage is a vanity metric
The metric we actually track: Not code coverage, but "time to confidence." How long after a code change do you feel confident it's safe to deploy? If the answer is "I run the tests and if they pass, I deploy" — your testing strategy is working. If the answer involves manual testing, checking logs, or "crossing fingers" — your tests aren't catching what matters.

What We've Learned From Production Failures

Every testing strategy improves after a production incident. Here are patterns from our post-mortems:

  • 90% of our production bugs were at integration boundaries. Service A sent a date string, Service B expected a timestamp. The unit tests for both services passed. An integration test would have caught it instantly
  • Visual regression testing prevented 3 client escalations in one quarter. CSS changes that looked fine in Chrome broke the layout in Safari. Playwright's screenshot comparison caught them in CI
  • Our best-performing test suite has 40% integration, 30% unit, 20% E2E, 10% contract. The pyramid inverters are right — integration tests catch the most bugs per test written
  • The hardest tests to write are always the most valuable. Testing payment webhooks, testing email delivery, testing file upload processing — they're complex to set up but protect against the scenarios with highest business impact

Frequently Asked Questions

What code coverage percentage should we target?

We target 80% coverage on business logic and critical paths, with no target on boilerplate code. A focused 80% is better than a padded 95% full of trivial tests. More importantly, track "mutation testing score" — it measures whether your tests actually catch bugs, not just whether they execute code lines.

How do we handle flaky tests?

Quarantine immediately — move flaky tests to a separate suite that doesn't block CI. Then fix or delete within a week. Common causes: timing dependencies (use explicit waits), shared state (isolate test data), external service calls (use test doubles). A flaky test erodes trust in the entire suite.

Should we write tests before or after the code?

TDD works well for business logic with clear requirements — write the test, see it fail, implement, see it pass. For exploratory work or UI, we write tests after the code stabilizes. The important thing isn't when you write the test, it's that you write the right test. A good test written after is better than a bad test written before.

How do we test legacy code with no existing tests?

Start with "characterization tests" — tests that document what the code currently does, not what it should do. Then add integration tests around the most critical and most-changed modules. Don't try to get to 80% coverage at once. Every bug fix gets a regression test. Over 6 months, coverage grows naturally around the code that matters most.

Pillai Infotech Engineering Team

We've built testing strategies for startups shipping weekly and enterprises with regulated release cycles. Our philosophy: test what breaks, not what's easy to test.

Need Help Building Your Testing Strategy?

From test architecture design to CI/CD integration — we help teams build testing practices that catch real bugs without slowing them down.

Get a Free Consultation Our Services