Estimation: Why We're All Bad at It (And How to Be Less Bad)
Summary
Let’s be honest: if you’re a software engineer, you’ve probably been burned by estimation. Maybe you promised a feature in a week and it took a month. Maybe you padded your estimate, only to finish early and feel guilty. This guide is for every engineer who’s ever stared at a Jira ticket and thought, “How hard can it be?”—and then found out.
We’ll walk through real-world stories, anti-patterns, frameworks, and checklists that will help you (and your team) get better at estimation—without losing your sanity or your credibility.
Why Engineers Struggle with Estimation
Story:
Picture this: You’re in sprint planning. The PM asks, “How long will this take?” You want to give a perfect answer, but you only know half the requirements. You start researching, hoping to find certainty. Hours pass. The team is waiting. You finally blurt out a number, secretly hoping you’re not wildly off.
The Engineer's Dilemma: Engineers are trained to solve problems optimally, but estimation is about managing uncertainty and making decisions with incomplete information. The best estimators are not the most precise, but the most adaptable.
After working with years of agile development, I’ve learned the core issue isn’t mathematical—it’s psychological. Engineers crave certainty, but estimation is about making the best call with what you know now, and adjusting as you learn.
Common Engineering Estimation Anti-Patterns
Story:
I once worked with a team that spent two days researching a new API before estimating a single story. Another team always estimated for “perfect code,” then missed deadlines when reality hit. And almost every team I’ve seen forgets to add time for meetings, reviews, or that one teammate who always has questions.
Anti-Pattern | Why Engineers Do This | Better Approach |
---|---|---|
Over-researching before estimation | Desire for certainty and technical depth | Time-box research to 30 minutes, create spike stories for unknowns |
Estimating perfect code solutions | Pride in clean implementation | Estimate for 'good enough' MVP, schedule refactoring separately |
Ignoring context switching overhead | Focus on pure coding time | Add 25-40% buffer for meetings, reviews, and interruptions |
Assuming ideal conditions | Optimism bias | Account for support, bugs, and team dependencies |
Story Points vs Hours: The Engineer's Perspective
Story:
Early in my career, I insisted on estimating everything in hours. “I’ll just add up the time for each task,” I thought. But I was always wrong—sometimes by a little, sometimes by a lot. Then I joined a team that used story points. At first, it felt imprecise. But over time, I realized it was about relative complexity, not time. And suddenly, our sprint planning got a lot less stressful.
Config change One-line fix CSS tweak
Simple function Basic validation Unit test addition
CRUD endpoint Form component DB migration
Complex algorithm Integration work Performance optimization
New service Major refactoring Complex feature
Needs breakdown Epic-level work High uncertainty
Memorable Tip: Story points are like T-shirt sizes—focus on fit, not exact measurements. If you’re debating between two sizes, pick the larger and break it down later.
Converting Your Engineering Intuition
Story:
A teammate once said, “This API will take me 2 days.” I asked, “What’s it similar to?” He replied, “The OTP authentication from last sprint.” That’s when it clicked: estimation isn’t about guessing hours—it’s about comparing to what you’ve done before.
// Instead of: "This API endpoint will take me 2 days"
// Think: "This is similar complexity to the OTP authentication we did last sprint"
// Story Point Calibration:
1 point = Hotfix or config change
2 points = Single responsibility function
3 points = Standard feature (follows patterns)
5 points = Complex feature or new pattern
8 points = Cross-system integration or major refactor
13 points = Too big—break it down!
The Hidden Complexity Engineers Miss
Story:
We once estimated a “simple” feature—just add a field to a form. But it touched legacy code, required major library updates as devops library were no longer supported, so no new deployments were possible. The “quick win” became a two-week saga.
For Every Story, Ask Yourself:
- Does this touch legacy code require library updates?
- Will this require database schema changes?
- Are there performance implications at scale?
- Does this need to work across multiple environments?
- Will this change affect existing APIs or integrations?
- Is there sufficient observability (logging, metrics, alerts)?
- Does this require coordination with other teams?
- Are there security considerations (auth, data privacy)?
- Will this work with our current CI/CD pipeline?
- Is there adequate error handling and rollback strategy?
Memorable Tip: If you answer "yes" to more than three checklist items, double your estimate or break the story down.
Complexity Multipliers by Code Category
Story:
Not all code is created equal. Adding a feature to a greenfield project is nothing like changing a legacy system or integrating with a third-party API. I’ve seen teams treat them the same—and pay the price.
Code Type | Base Complexity | Common Multipliers | Final Estimate |
---|---|---|---|
Greenfield feature | 3 points | +1 for tests, +1 for docs | 5 points |
Legacy system modification | 3 points | +2 for reverse engineering, +2 for testing edge cases | 8 points |
Third-party integration | 5 points | +3 for API limitations, +2 for error handling | 13 points (break down!) |
Performance optimization | 5 points | +2 for profiling, +1 for benchmarking | 8 points |
Technical Debt and Legacy Code Estimation
Story:
I once inherited a codebase where every change felt like defusing a bomb. No docs, no tests, and the original authors were long gone. We started scoring technical debt separately—and our estimates finally matched reality.
Technical Debt Reality: 40% of engineering time is spent working with legacy code, yet most teams don't factor this into estimation. Always score technical debt separately and add a buffer.
Legacy Code Complexity Assessment
Metric | Score 1 | Score 5 |
---|---|---|
Code Quality | Well-tested, documented, follows patterns | No tests, unclear business logic, multiple code smells |
Documentation | Comprehensive docs and comments | No documentation, requires archaeology |
Test Coverage | >80% coverage with good tests | No tests, manual testing only |
Team Knowledge | Multiple team members understand code | Only one person (or nobody) knows how it works |
// Technical Debt Estimation Formula
Base Story Points × (1 + (Legacy Score / 10))
Examples:
- Simple feature in well-maintained code: 3 × (1 + 0.5) = 4.5 → 5 points
- Simple feature in legacy system: 3 × (1 + 1.8) = 8.4 → 8 points
- Complex feature in legacy system: 8 × (1 + 2.0) = 24 → Break down!
Legacy Score = (Code Quality + Documentation + Test Coverage + Knowledge) / 4
Technical Debt Story Types
Story:
A team I coached started tracking “debt stories” as first-class citizens. Suddenly, refactoring, adding tests, and performance work were visible—and estimable.
Debt Type | Estimation Approach | Typical Range |
---|---|---|
Refactoring for maintainability | Original implementation × 0.5-0.8 | 5-13 points |
Adding tests to untested code | 1 point per 100 lines + complexity | 3-8 points |
Performance improvement | Profiling (2-3) + Fix (3-8) + Verification (1-2) | 8-13 points |
Security vulnerability fix | Research (2-5) + Fix (2-8) + Testing (2-5) | 8-13+ points |
Practical Estimation Frameworks for Daily Use
The 3-Layer Estimation Approach
Story:
When we started breaking stories into “core implementation,” “integration,” and “production readiness,” our estimates got more accurate—and our releases got smoother.
Pure feature development without edge cases
APIs, database changes, external dependencies
Testing, monitoring, documentation, deployment
Example: Story: "User can upload profile pictures" Layer 1: File upload UI + basic storage (3 points) Layer 2: Image resizing, CDN integration (3 points) Layer 3: Error handling, tests (2 points) Total: 8 points
Spike Story Criteria for Engineers
Story:
We once spent a week spinning our wheels on a new technology. If only we’d called it a spike, set a time box, and moved on! Now, we use a checklist to decide when to spike.
Spike Story Triggers: - Technology you haven’t used before (new framework, library, service) - Performance requirements without baseline metrics - Integration with undocumented or poorly documented APIs - Architecture decisions affecting multiple components - Uncertain data migration complexity - Security requirements without clear implementation path
Spike Story Checklist
- Technology is new or unfamiliar
- Performance requirements are unclear
- Integration with unknown APIs
- Major architectural decisions
- Uncertain data migration
- Security requirements are ambiguous
# Spike Story Template
Title: "Research [Technology/Approach] for [Feature]"
Acceptance Criteria:
- Document 2-3 implementation approaches with pros/cons
- Identify major technical risks and mitigation strategies
- Provide complexity estimate for actual implementation
- Time-box to 1-2 days maximum
Example: "Research WebRTC implementation for video calling feature"
- Evaluate solutions: PeerJS, Socket.io, native WebRTC
- Test with 2-4 concurrent connections
- Document browser compatibility and mobile considerations
- Estimate complexity for MVP implementation
DevOps and CI/CD Considerations
Story:
A “simple” feature once ballooned because we forgot to estimate the database migration and new monitoring dashboards. Now, we always ask: what’s needed to get this into production?
Deployment Complexity Matrix
Deployment Scenario | Additional Points | Key Considerations |
---|---|---|
Standard web app deployment | +0 to +1 | Existing CI/CD pipeline handles it |
Database migration required | +1 to +3 | Data volume, downtime requirements, rollback strategy |
New service deployment | +2 to +5 | Infrastructure provisioning, monitoring setup, service discovery |
Multi-service coordination | +3 to +8 | Feature flags, gradual rollout, service dependencies |
Breaking API changes | +3 to +8 | Backward compatibility, client coordination, versioning strategy |
Infrastructure as Code Estimation
Story:
We once underestimated a “simple” config change—until it had to be rolled out to five environments, each with its own quirks. Now, we factor in every environment and compliance requirement.
Task Type | Point Range | Description |
---|---|---|
Configuration Changes | 1-2 points | Environment variables, feature flags, basic config updates |
New Service Provisioning | 3-5 points | Terraform modules, network policies, security groups |
Pipeline Optimization | 2-5 points | Parallel jobs, caching strategies, test optimization |
Monitoring Integration | 3-8 points | Metric collection, alert thresholds, dashboard creation |
// Infrastructure Estimation Formula
Base Points × Environment Factor + Compliance Factor
Where:
- Environment Factor = 1 + (0.2 × number of environments)
- Compliance Factor = 2 if HIPAA/GDPR required
Example:
- New database cluster needing PCI compliance
- Base (3) × (1 + 0.4) + 2 = 3 × 1.4 + 2 = 6.2 → 8 points
CI/CD Pipeline Impact Assessment
Story:
A team I worked with added a new test type to the pipeline—then watched build times double. Now, we always estimate the impact of pipeline changes.
Pipeline Change | Estimation Impact | Engineering Considerations |
---|---|---|
Add new test type | +1-2 points | Flakiness risk, runtime increase, parallelization |
Security scanning | +2-3 points | False positives, triage time, baseline exceptions |
Multi-stage deployments | +3-5 points | Rollback strategies, verification checks |
Artifact management | +2-4 points | Versioning, storage costs, cleanup policies |
Handling Uncertainty and Spikes
Story:
If you ever find yourself saying, “I just need to look up…” more than three times, you’re in spike territory. Don’t guess—research, document, and move forward.
Rule of Thumb: If you say "I just need to look up..." more than three times, convert it to a spike story. Research is work—track it!
Spike Story Template
Title: [Technology/Problem] Research Spike
Goal: Answer [specific technical question] to enable estimation
Deliverables:
- Proof of concept code snippet
- Architecture diagram options
- Risk assessment matrix
- Recommended implementation approach
Time Box: 4-16 hours (max 2 days)
Acceptance Criteria:
- Team can estimate implementation stories
- Major technical risks identified
Uncertainty Assessment Matrix
Story:
One team doubled their estimates for “unknown unknowns”—and finally started hitting their sprint goals. Another built prototypes for every new technology. Both improved their accuracy.
Uncertainty Level | Estimation Approach | Engineer Action |
---|---|---|
Known Unknowns | Add 20% buffer | Document assumptions |
Unknown Unknowns | Requires spike | Create research story |
Complex Dependencies | Double estimates | Map dependency graph |
New Technology | 3× initial estimate | Build prototype |
Team Velocity and Performance Metrics
Story:
I’ve seen managers weaponize velocity, comparing teams and individuals. It never ends well. Velocity is a planning tool, not a performance metric.
Engineer Warning: Velocity is a planning tool, not a performance metric. Never compare velocities between teams or use it for individual evaluations.
Calculating Engineering Velocity
// Rolling Velocity Formula
velocity = (Σ last 3 sprints points) / 3
// Context Factors
adjusted_velocity = velocity × context_factor
// Common Context Factors:
- 0.8: Key team member on vacation
- 0.7: Major production incident
- 1.2: Working on familiar codebase
- 0.6: New technology stack
Engineering Metrics That Matter
Story:
A team I led started tracking cycle time and PR size. Suddenly, our reviews sped up, bugs dropped, and our estimates got sharper.
Core Metrics
- Cycle Time < 4 days (commit → production)
- PR Size < 400 lines (smaller changes = faster reviews)
- Defect Rate < 5% (bugs in first 24h post-deploy)
Tooling and Automation for Better Estimates
Story:
We built a CLI tool that searched past tickets, calculated complexity, and suggested story point ranges. It didn’t make us perfect—but it made us better, faster.
Tool Type | Examples | Engineering Benefit |
---|---|---|
Complexity Analysis | SonarQube, CodeClimate | Identify high-risk code areas pre-estimation |
Historical Data | Jira, Linear, Shortcut | Compare similar past stories |
CI Analytics | Buildkite, GitHub Actions | Track pipeline bottlenecks |
Architecture Diagrams | Mermaid, Diagrams.net | Visualize system interactions |
Pro Tip: Build an "Estimation Helper" CLI tool that: - Searches past similar tickets - Calculates complexity scores - Generates checklist from code analysis - Suggests story point ranges
Team Dynamics and Continuous Improvement
Story:
The best estimation meetings I’ve seen are safe spaces. People admit what they don’t know, challenge each other’s assumptions, and learn from every “bad” estimate.
Psychological Safety: The best teams create a safe space for honest estimation. Encourage everyone to speak up if they’re unsure, and never punish “bad” estimates—use them as learning opportunities.
- Use planning poker or silent estimation to avoid anchoring bias.
- When estimates differ, discuss assumptions and risks, not just numbers.
- Regularly review estimation accuracy in retrospectives and adjust your process.
Conclusion: Dos and Don’ts of Software Estimation
Story:
If you only remember one thing: estimation is a team sport. The best teams break work into small chunks, surface hidden complexity, and treat every “miss” as a chance to learn.
Estimation Dos
- Break work into the smallest testable chunks
- Use checklists to surface hidden complexity
- Time-box research and create spike stories for unknowns
- Add buffers for meetings, reviews, and context switching
- Review estimation accuracy regularly and adjust
- Encourage open discussion and psychological safety
- Document patterns and share knowledge across the team
Estimation Don’ts
- Don’t estimate in hours—use relative complexity (story points)
- Don’t ignore technical debt or legacy code
- Don’t assume ideal conditions—plan for interruptions
- Don’t compare velocities between teams or individuals
- Don’t punish “bad” estimates—use them to improve
- Don’t skip retrospectives on estimation accuracy
Final Insight: Estimation is a journey, not a destination. The best teams treat it as an engineering discipline—measured, reviewed, and improved over time.