We've built testing suites that caught zero real bugs and testing suites that prevented six-figure outages. The difference wasn't the number of tests — it was what we tested and how. After shipping software for clients across fintech, healthcare, and SaaS, here's our honest take on testing strategies that deliver real confidence.
What We'll Cover
Rethinking the Testing Pyramid
The classic testing pyramid — lots of unit tests at the base, fewer integration tests in the middle, minimal E2E tests at the top — made sense when unit tests were cheap and E2E tests required manual Selenium setups that broke every week. That world doesn't exist anymore.
Modern tools have shifted the economics:
| Test Type | 2015 Reality | 2025 Reality |
|---|---|---|
| Unit tests | Fast, cheap, reliable | Still fast, but many test implementation details that change constantly |
| Integration tests | Slow, needed real databases, flaky | Testcontainers makes them fast and reliable. Docker spins up real Postgres in 2 seconds |
| E2E tests | Selenium: slow, brittle, expensive to maintain | Playwright/Cypress: fast, stable, parallel execution, visual regression for free |
We don't follow the pyramid. We follow what Kent C. Dodds calls the "testing trophy" — heavily weighted towards integration tests, with strategic unit tests and minimal but targeted E2E tests. This matches our experience: integration tests catch the most real bugs per hour of engineering time invested.
Unit Tests: Where They Actually Help
Unit tests get a lot of reverence. They also get a lot of waste. We've seen codebases with 2,000 unit tests that mostly test that getFullName() concatenates first and last name. Those tests pass when the system is broken and break when the system is fine (because someone renamed a method).
Test Business Logic, Not Implementation
// BAD: Tests implementation details
test('should call UserRepository.save', () => {
const mockRepo = { save: jest.fn() };
const service = new UserService(mockRepo);
service.createUser({ name: 'Alice' });
expect(mockRepo.save).toHaveBeenCalledWith({ name: 'Alice' });
});
// This test breaks if you refactor the internal save mechanism
// but doesn't catch if validation is wrong
// GOOD: Tests behaviour
test('rejects user with duplicate email', async () => {
await createUser({ email: 'alice@test.com' });
const result = await createUser({ email: 'alice@test.com' });
expect(result.error).toBe('EMAIL_ALREADY_EXISTS');
});
// Tests what the system DOES, not HOW it does it
When Unit Tests Shine
- Pure functions with complex logic — pricing calculations, date parsing, data transformations, validation rules. These have clear inputs and outputs with no side effects
- Edge cases — boundary conditions (empty arrays, null values, max integers) that are hard to hit through integration tests
- Algorithms — sorting, searching, rate limiting, retry logic. Test the algorithm in isolation, test the integration separately
- State machines — order lifecycle (pending → paid → shipped → delivered), user status transitions. Each valid and invalid transition is a test case
When Unit Tests Are Waste
- Testing that a controller calls a service (test the HTTP endpoint instead)
- Testing that a service calls a repository (test with a real database instead)
- Testing getters/setters/constructors
- Testing third-party library behaviour (they have their own tests)
Integration Tests: The Highest ROI
Integration tests verify that components work together. They're where we invest most of our testing effort because they catch the bugs that actually reach production — misconfigured database queries, broken API contracts, incorrect middleware chains.
The Testcontainers Revolution
The reason integration tests used to be painful was shared test databases. One developer's test data interfered with another's. Testcontainers changed this — each test run spins up a fresh Docker container with a real database. Your tests run against real PostgreSQL, real Redis, real Elasticsearch.
// Integration test with Testcontainers (Node.js)
import { PostgreSqlContainer } from '@testcontainers/postgresql';
let container;
let db;
beforeAll(async () => {
container = await new PostgreSqlContainer()
.withDatabase('test_db')
.start();
db = await createPool({
connectionString: container.getConnectionUri()
});
await runMigrations(db); // Same migrations as production
});
afterAll(async () => {
await container.stop();
});
test('creates order with correct total', async () => {
// Seed data
const product = await db.query(
'INSERT INTO products (name, price) VALUES ($1, $2) RETURNING id',
['Widget', 29.99]
);
// Test the actual API endpoint
const response = await request(app)
.post('/api/orders')
.send({ items: [{ productId: product.rows[0].id, qty: 3 }] })
.expect(201);
expect(response.body.total).toBe(89.97);
// Verify database state
const order = await db.query('SELECT * FROM orders WHERE id = $1', [response.body.id]);
expect(order.rows[0].status).toBe('pending');
});
What to Integration Test
| Test This | Why | Common Bug It Catches |
|---|---|---|
| API endpoints | Tests routing, validation, auth, serialization, DB queries in one shot | Missing auth check, wrong status code, incorrect SQL join |
| Database queries | ORM-generated SQL often doesn't match expectations | N+1 queries, incorrect WHERE clauses, missing indexes |
| Message consumers | Deserialization, idempotency, error handling | Message format changes, duplicate processing, poison messages |
| External API clients | Serialization, error handling, retry logic | Changed API response format, timeout handling, rate limit handling |
| Auth flows | Most security bugs are integration bugs | Token expiry not checked, role escalation, missing permission checks |
E2E Tests: Less Is More
E2E tests simulate real user flows through the full application. They're expensive to write, slow to run, and the first to break. The trick is writing just enough to cover your critical paths — not trying to test everything through the browser.
The Critical Path Strategy
We identify the 5-10 user journeys that matter most (the ones where failure means revenue loss or user trust damage) and write E2E tests only for those.
For a typical SaaS application, our E2E suite covers:
- Sign up → onboarding → first action (the conversion funnel)
- Login → core workflow → expected outcome (the happy path)
- Payment flow → confirmation → receipt (the money path)
- Error state → recovery → success (the resilience path)
That's it. 15-20 E2E tests total. Everything else is covered by integration tests.
Playwright: Our Current Choice
// Playwright E2E test — critical payment flow
import { test, expect } from '@playwright/test';
test('complete purchase flow', async ({ page }) => {
// Login
await page.goto('/login');
await page.fill('[data-testid="email"]', 'test@example.com');
await page.fill('[data-testid="password"]', 'testpass123');
await page.click('[data-testid="login-button"]');
// Add to cart
await page.goto('/products/widget-pro');
await page.click('[data-testid="add-to-cart"]');
await expect(page.locator('[data-testid="cart-count"]')).toHaveText('1');
// Checkout
await page.click('[data-testid="checkout"]');
await page.fill('[data-testid="card-number"]', '4242424242424242');
await page.fill('[data-testid="expiry"]', '12/28');
await page.fill('[data-testid="cvc"]', '123');
await page.click('[data-testid="pay-button"]');
// Verify
await expect(page.locator('[data-testid="order-confirmation"]'))
.toBeVisible({ timeout: 10000 });
await expect(page.locator('[data-testid="order-total"]'))
.toContainText('₹2,999');
});
E2E Anti-Patterns We've Learned the Hard Way
- Don't test through the UI what you can test through the API. If your integration tests cover the business logic, the E2E test only needs to verify the UI renders correctly and user interactions trigger the right actions
- Don't share state between tests. Each E2E test should set up its own data. Shared state causes mysterious failures when tests run in a different order
- Don't sleep.
await page.waitForTimeout(3000)is a reliability timebomb. Wait for specific elements or network requests instead
Contract Testing for Microservices
When you have 10 services communicating via APIs, how do you know a change in Service A won't break Service B? Integration tests help, but running all services together in CI is slow and fragile. Contract testing is the answer.
How It Works
The consumer (Service B) writes a "contract" — a specification of what it expects from Service A's API. Service A's CI verifies that it still satisfies all consumer contracts. If someone changes Service A's response format, the contract test fails before it breaks Service B in production.
| Tool | Language Support | Our Take |
|---|---|---|
| Pact | JS, Java, Python, Go, .NET, Ruby | The standard for consumer-driven contract testing. Pact Broker centralizes contracts. We use this for most projects |
| Spring Cloud Contract | Java/Kotlin (Spring ecosystem) | Great if you're all-in on Spring. Provider-driven instead of consumer-driven |
| Specmatic | Any (uses OpenAPI specs) | Uses your existing OpenAPI spec as the contract. Lower friction to adopt if you already have API docs |
Performance Testing That Matters
Most performance testing we see is either "run a load test once before launch and never again" or "we have no performance testing." Both are problems.
Three Types of Performance Tests
- Baseline benchmarks (run in CI) — Simple tests that verify key endpoints respond within acceptable latency. If your login endpoint usually responds in 150ms and suddenly takes 800ms after a code change, the CI should catch that. We use k6 for this — it's scriptable, fast, and integrates with CI pipelines
- Load tests (run weekly/before releases) — Simulate expected peak traffic. For an Indian e-commerce client, we simulate Diwali sale traffic (5x normal) and verify the system stays responsive. Look for: response time degradation, error rate increases, database connection pool exhaustion, memory leaks
- Stress tests (run quarterly) — Push past expected limits to find the breaking point. Where does the system fail first? Is it the database? The API gateway? A specific microservice? Knowing your limits lets you plan capacity
// k6 performance test — baseline benchmark in CI
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
thresholds: {
http_req_duration: ['p(95)<500'], // 95th percentile under 500ms
http_req_failed: ['rate<0.01'], // Less than 1% error rate
},
stages: [
{ duration: '30s', target: 50 }, // Ramp to 50 users
{ duration: '1m', target: 50 }, // Hold at 50 users
{ duration: '10s', target: 0 }, // Ramp down
],
};
export default function () {
const res = http.get('https://api.example.com/products');
check(res, {
'status 200': (r) => r.status === 200,
'response time OK': (r) => r.timings.duration < 500,
});
sleep(1);
}
Building Your Testing Strategy
Here's how we approach testing for different project types. The strategy isn't universal — it depends on what you're building.
| Project Type | Unit Tests | Integration Tests | E2E Tests | Contract Tests |
|---|---|---|---|---|
| Monolith API | Business logic, validators | Every endpoint, DB queries | Critical UI flows only | Not needed (one service) |
| Microservices | Domain logic per service | Each service's APIs + DB | 5-10 user journeys | All service boundaries |
| Frontend SPA | Utility functions, hooks | Component rendering, API mocks | Full user flows + visual regression | API expectations |
| Data pipeline | Transforms, parsers | Pipeline stages with sample data | End-to-end data flow | Schema validation |
| Mobile app | Business logic, state management | API client, local storage | Detox/Appium for critical flows | Backend API contracts |
Our Testing Checklist for New Projects
- CI runs tests on every PR. Non-negotiable. If tests don't run automatically, they won't run at all
- Tests complete in under 10 minutes. Longer than that and developers stop waiting. Parallelize, use Testcontainers, split into fast/slow suites
- No flaky tests. A flaky test that fails 5% of the time wastes more engineering time than having no test at all. Fix it or delete it
- Test data is self-contained. Every test creates its own data and cleans up after itself. No shared test databases, no seed files that drift
- Coverage targets are sensible. We target 80% line coverage on business logic, 0% on boilerplate. 100% coverage is a vanity metric
What We've Learned From Production Failures
Every testing strategy improves after a production incident. Here are patterns from our post-mortems:
- 90% of our production bugs were at integration boundaries. Service A sent a date string, Service B expected a timestamp. The unit tests for both services passed. An integration test would have caught it instantly
- Visual regression testing prevented 3 client escalations in one quarter. CSS changes that looked fine in Chrome broke the layout in Safari. Playwright's screenshot comparison caught them in CI
- Our best-performing test suite has 40% integration, 30% unit, 20% E2E, 10% contract. The pyramid inverters are right — integration tests catch the most bugs per test written
- The hardest tests to write are always the most valuable. Testing payment webhooks, testing email delivery, testing file upload processing — they're complex to set up but protect against the scenarios with highest business impact
Frequently Asked Questions
What code coverage percentage should we target?
We target 80% coverage on business logic and critical paths, with no target on boilerplate code. A focused 80% is better than a padded 95% full of trivial tests. More importantly, track "mutation testing score" — it measures whether your tests actually catch bugs, not just whether they execute code lines.
How do we handle flaky tests?
Quarantine immediately — move flaky tests to a separate suite that doesn't block CI. Then fix or delete within a week. Common causes: timing dependencies (use explicit waits), shared state (isolate test data), external service calls (use test doubles). A flaky test erodes trust in the entire suite.
Should we write tests before or after the code?
TDD works well for business logic with clear requirements — write the test, see it fail, implement, see it pass. For exploratory work or UI, we write tests after the code stabilizes. The important thing isn't when you write the test, it's that you write the right test. A good test written after is better than a bad test written before.
How do we test legacy code with no existing tests?
Start with "characterization tests" — tests that document what the code currently does, not what it should do. Then add integration tests around the most critical and most-changed modules. Don't try to get to 80% coverage at once. Every bug fix gets a regression test. Over 6 months, coverage grows naturally around the code that matters most.