Google found that code review is the single most effective practice for finding bugs — more effective than testing, static analysis, or pair programming. But that stat only holds when reviews are done well. A rubber-stamp "LGTM" in 30 seconds? That's not a review. A 47-comment thread arguing about brace placement? That's not useful either.
What We'll Cover
Why Code Reviews Fail
We've worked with dozens of engineering teams. The same failure patterns show up everywhere.
| Failure Pattern | What It Looks Like | Root Cause | Fix |
|---|---|---|---|
| Rubber stamping | "LGTM" within 2 minutes on a 500-line PR | Reviewer is overloaded or doesn't feel accountable | Limit PR size, assign explicit reviewer responsibility, track review thoroughness |
| Nitpick wars | 47 comments about naming, formatting, import order | No linter/formatter automation, unclear style guide | Automate style enforcement. Save human attention for logic and design |
| Gatekeeper syndrome | One senior dev blocks every PR with "I would have done it differently" | Ego, lack of trust, no clear review criteria | Define "blocking" vs "suggestion" comments. Reviewers don't own the code — authors do |
| Review bottleneck | PRs sit for 2-3 days waiting for review | No review SLA, reviewers treat reviews as low priority | Set a 4-hour SLA for first response. Rotate reviewer duty |
| Architecture arguments | Fundamental design disagreements surface in PR comments | No upfront design discussion before coding started | Design reviews before implementation. PRs should be about implementation, not direction |
PR Size: The Single Most Important Factor
Google's research on code review effectiveness found that review quality drops dramatically as PR size increases. The numbers are stark:
| PR Size (Lines Changed) | Review Time | Bug Detection Rate | Our Verdict |
|---|---|---|---|
| 1-100 lines | 15-30 minutes | High — reviewer can hold full context | Ideal size. Most PRs should be here |
| 100-300 lines | 30-60 minutes | Good — reviewer needs to focus but can follow | Acceptable for feature PRs |
| 300-500 lines | 1-2 hours | Declining — reviewer starts skimming | Consider splitting. If it can't be split, call it out in the PR description |
| 500+ lines | "I'll look at it later" | Low — rubber stamp territory | Must split. No exceptions |
Our rule: if a PR is over 300 lines (excluding generated code, migrations, and lock files), we split it. The author, not the reviewer, is responsible for making PRs reviewable.
How to Split Large Changes
- Infrastructure → Logic → UI: Database migration in one PR, business logic in the next, frontend changes in the third
- Refactor → Feature: Refactor the existing code to make the new feature easier, merge that, then add the feature in a clean PR
- Feature flag: Ship the feature behind a flag in one PR. Enable the flag in a separate, trivial PR. This also gives you a kill switch
What to Actually Review (A Checklist)
Not everything deserves equal attention. Here's what we focus on, in priority order:
Priority 1: Correctness and Security
- Does the code do what the ticket/spec says? (Not what you think it should do — what was requested)
- Are there security vulnerabilities? SQL injection, XSS, auth bypass, exposed secrets, insecure deserialization
- Are error cases handled? What happens when the API returns 500? When the database times out? When the input is malformed?
- Are race conditions possible? Especially in concurrent code, shared state, and database operations without proper locking
Priority 2: Design and Maintainability
- Is the code in the right place? Does it belong in this service/module/file?
- Is it testable? If you can't write a test for it without mocking 15 things, the design might be wrong
- Will the next developer understand it? (The next developer might be you in 6 months)
- Does it duplicate existing functionality? Check if a utility or pattern already exists in the codebase
Priority 3: Performance (When It Matters)
- N+1 database queries (most common performance bug in code reviews)
- Missing pagination on list endpoints
- Unbounded loops or recursive calls
- Large objects in memory when streaming would work
Don't Review (Automate Instead)
- Formatting, whitespace, import ordering → Prettier, ESLint, Black
- Type errors → TypeScript, mypy
- Unused variables, dead code → Linters
- Dependency vulnerabilities → Dependabot, Snyk
Giving Feedback That Works
How you say something matters as much as what you say. We've seen team morale destroyed by harsh review comments and team quality destroyed by reviews that are too polite to point out real problems.
The Comment Taxonomy
We prefix every review comment with a label. It eliminates ambiguity about whether the comment is blocking or optional.
| Prefix | Meaning | Example |
|---|---|---|
| 🔴 blocker: | Must fix before merge. Bug, security issue, or correctness problem | "blocker: This SQL query is vulnerable to injection. Use parameterized queries." |
| 🟡 suggestion: | Recommended improvement. Author decides whether to adopt | "suggestion: Consider extracting this into a helper function — it's used in 3 places." |
| ❓ question: | Genuine question — reviewer doesn't understand something | "question: Why do we retry 5 times here? Is there a specific failure mode that needs this?" |
| 💡 nit: | Minor style/preference. Not worth blocking | "nit: I'd name this userCount instead of cnt, but not a hill I'll die on." |
| 👍 praise: | Something done well. Reinforces good patterns | "praise: Great error handling here. The retry with exponential backoff is exactly right." |
Feedback Anti-Patterns
- "Why didn't you...?" → Instead: "Have you considered X? It might handle the edge case where..."
- "This is wrong." → Instead: "This will return null when the user hasn't set their email. We need to handle that case."
- "I would have done it differently." → Instead: only say this if your way is measurably better. Different isn't better
- Commenting on 20 things at once. → If a PR needs 20 comments, it's probably too large. Say that instead
Automate the Boring Parts
Every minute a reviewer spends on formatting or lint violations is a minute not spent on logic and design. Automate everything that can be automated.
| What to Automate | Tool | When It Runs |
|---|---|---|
| Code formatting | Prettier (JS/TS), Black (Python), gofmt (Go) | Pre-commit hook + CI check |
| Linting | ESLint, pylint, golangci-lint | Pre-commit hook + CI |
| Type checking | TypeScript, mypy, Flow | CI (blocks merge if fails) |
| Security scanning | Snyk, Semgrep, GitHub Advanced Security | CI on every PR |
| Test coverage | Codecov, Coveralls | CI — reports coverage change on PR |
| PR size check | GitHub Actions (custom), Danger.js | CI — warns if PR exceeds 300 lines |
Review Process That Scales
For Small Teams (2-5 Developers)
Everyone reviews everything. Round-robin assignment. Review within 4 hours. Keep it informal — Slack messages and quick discussions are fine.
For Medium Teams (5-15 Developers)
Use CODEOWNERS to auto-assign reviewers based on file paths. Require 1 approval for standard PRs, 2 for critical paths (auth, payments, infrastructure). Set a review SLA: first response within 4 hours, full review within 1 business day.
For Large Teams (15+ Developers)
Designated review rotation (each person has 1-2 hours/day for reviews). Split ownership by domain — the payments team reviews payment code, the platform team reviews infrastructure code. Use PR templates to standardize descriptions and checklists. Track review metrics: time to first review, review cycle time, PR rejection rate.
Our PR Template
## What
[One sentence describing what this PR does]
## Why
[Link to ticket/issue. Context on why this change is needed]
## How
[Brief description of the approach. Call out anything unusual]
## Testing
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manually tested locally
## Checklist
- [ ] PR is under 300 lines (excluding generated code)
- [ ] No secrets or credentials committed
- [ ] Database migrations are backward-compatible
- [ ] API changes are backward-compatible (or versioned)
Frequently Asked Questions
How long should a code review take?
For a well-sized PR (under 300 lines), 15-30 minutes. If you're spending more than an hour, the PR is probably too large. Google's data shows that review effectiveness drops after 60 minutes — take a break and come back if the PR legitimately requires more time.
Should junior developers review senior developers' code?
Absolutely. Junior reviewers catch different things — unclear naming, missing documentation, confusing logic. They also learn the codebase faster. The review isn't about seniority, it's about a fresh pair of eyes. Seniors should welcome questions from juniors — it reveals where the code isn't clear enough.
How do we handle disagreements in code reviews?
If it's a style preference, the author wins (as long as it follows the team's style guide). If it's a design disagreement, escalate to a 15-minute call — text comments are terrible for nuanced discussions. If it's a correctness concern, the reviewer should explain the specific failure case. Never merge with unresolved blockers.
Should we use AI for code reviews?
AI reviewers (like GitHub Copilot code review) are good at catching patterns: unused imports, potential null dereferences, and suggesting simpler implementations. They're terrible at design feedback, domain logic validation, and understanding business context. Use them as a first pass to catch the obvious stuff, then human review for the things that matter.