Our worst estimate was a "2-week" authentication system that took 11 weeks. Our best was a "6-month" platform rewrite that landed in 5.5 months. The difference wasn't talent — it was technique. The auth system was estimated by one person in 10 minutes. The platform rewrite was broken into 47 tasks, estimated by the team, and tracked against velocity. Better estimation isn't about better guessing; it's about better process.
What We'll Cover
Why Software Estimation Is Hard
Software estimation fails for predictable, well-studied reasons. Understanding them is half the battle.
| Bias | What Happens | Real Example | Mitigation |
|---|---|---|---|
| Optimism bias | We imagine the happy path. "If nothing goes wrong, it'll take 3 days" | Estimate: 3 days. Actual: 8 days (auth edge cases, testing, code review) | Three-point estimation forces you to think about worst case |
| Anchoring | First number mentioned becomes the anchor. Hard to deviate from it | PM says "this should take about a week, right?" Team says "yeah, a week" | Blind estimation (Planning Poker). Everyone reveals simultaneously |
| Planning fallacy | We plan for the individual task but forget overhead: meetings, context switching, code review, deployment | Coding: 2 days. Total: 4 days (with review, testing, deployment, fixes) | Track actual vs estimated consistently. Apply a multiplier based on history |
| Scope creep | "While we're at it, let's also add..." The scope grows but the estimate doesn't | "Add a search feature" becomes "add search with filters, autocomplete, and analytics" | Re-estimate when scope changes. Make the cost of additions visible |
| Unknown unknowns | You can't estimate what you don't know you don't know | "Integrate with their API" → API docs are wrong, auth flow is custom, rate limits undocumented | Add a "discovery spike" before estimating unfamiliar work |
Estimation Techniques Compared
| Technique | Effort to Use | Accuracy | Best For |
|---|---|---|---|
| T-shirt sizing (S/M/L/XL) | Very low | Low (good for rough prioritization) | Roadmap planning, backlog grooming. "Is this a week or a quarter?" |
| Story points (Planning Poker) | Medium | Medium (improves over time with velocity) | Sprint planning. Team estimates relative complexity |
| Three-point estimation | Medium | High (accounts for uncertainty) | Client proposals, project timelines |
| Reference class forecasting | Low (once you have data) | High (uses actual historical data) | Recurring work types (API endpoint, new page, migration) |
| Task decomposition + bottom-up | High | Highest | Large projects, fixed-price proposals. Worth the effort for high-stakes estimates |
Three-Point Estimation
For every task, provide three estimates instead of one:
- Optimistic (O): Everything goes right. No surprises. This almost never happens, but it's your floor
- Most Likely (M): The realistic case. Some issues crop up, some things are easier than expected
- Pessimistic (P): Murphy's Law applies. The API is broken, the requirement changes, you hit a framework bug
Formula: Expected = (O + 4M + P) / 6
Example: "Add payment integration"
- Optimistic: 3 days (Stripe docs are great, straightforward integration)
- Most likely: 5 days (some webhook edge cases, testing with real payment flows)
- Pessimistic: 12 days (payment provider API issues, PCI compliance complications, testing complications)
- Expected: (3 + 20 + 12) / 6 = 5.8 days
The power of this approach is that it forces you to think about what can go wrong — which is the part most single-point estimates skip entirely.
Reference Class Forecasting
Instead of estimating from scratch, look at how long similar work took in the past. This is the single most effective technique for teams that have been working together for a while.
How to Build Your Reference Database
After every sprint or project, record:
- Task type (new API endpoint, frontend page, database migration, etc.)
- Estimated time
- Actual time
- What caused the difference
After 3 months, you'll have patterns like:
- "New API endpoint with CRUD" → average 2.5 days (range: 1-5 days)
- "Third-party API integration" → average 6 days (range: 3-15 days)
- "Database migration with data transformation" → average 4 days (range: 2-10 days)
Our Estimation Multiplier
After tracking estimates vs actuals for a year across our team, we found a consistent pattern:
| Task Familiarity | Multiplier | Explanation |
|---|---|---|
| Done it before, same tech | 1.2x | Even familiar work takes 20% longer than you think (meetings, context switching) |
| Similar work, some unknowns | 1.5x | The unknowns add up. "It's like the other feature, but..." means 50% more |
| New domain or technology | 2.0x | Learning curve is real. Add time for research, experimentation, and rework |
| Never done anything like it | 3.0x | Don't estimate. Do a time-boxed spike (2-3 days) first, then estimate |
Tracking and Improving Over Time
The Estimation Accuracy Score
Track this metric per sprint: Accuracy = Estimated / Actual
- 1.0 = perfect estimate (rare)
- 0.5 = you estimated half the actual time (under-estimated by 2x)
- 2.0 = you estimated double the actual time (over-estimated by 2x)
Over time, your accuracy should trend toward 1.0. If your average is 0.6, you have a systematic under-estimation bias — apply a 1.7x multiplier until it corrects.
Sprint Velocity: The Reality Check
After 5-6 sprints, your team's velocity (story points completed per sprint) stabilizes. This becomes your most reliable planning tool. If velocity is 30 points/sprint and the backlog has 120 points, it's approximately 4 sprints of work. Velocity doesn't lie — it accounts for meetings, interruptions, bugs, and all the overhead that individual task estimates miss.
Frequently Asked Questions
Should we estimate in hours or story points?
Story points for sprint planning (relative complexity is more accurate than absolute time). Hours for client-facing proposals and project timelines (clients don't understand story points). Internally, track the correlation between points and actual hours — this gives you both.
How do we handle tasks that seem impossible to estimate?
If you can't estimate it, you don't understand it well enough. Schedule a 2-3 day time-boxed spike: research, prototype, document what you learned. After the spike, you'll be able to estimate the actual work. The spike itself is the estimatable task.
How much buffer should we add to estimates?
Don't add a flat buffer — that's guessing on top of guessing. Instead, use three-point estimation to account for uncertainty mathematically. If you must add a buffer for client proposals, 20-30% is common. But track your accuracy over time and use your actual multiplier instead of an arbitrary percentage.