Estimation: Why We're All Bad at It (And How to Be Less Bad)

13 min read
Tanzim Shahriar
Tanzim Shahriar
@tshahriar

Summary

Let’s be honest: if you’re a software engineer, you’ve probably been burned by estimation. Maybe you promised a feature in a week and it took a month. Maybe you padded your estimate, only to finish early and feel guilty. This guide is for every engineer who’s ever stared at a Jira ticket and thought, “How hard can it be?”—and then found out.

We’ll walk through real-world stories, anti-patterns, frameworks, and checklists that will help you (and your team) get better at estimation—without losing your sanity or your credibility.


Why Engineers Struggle with Estimation

Story:
Picture this: You’re in sprint planning. The PM asks, “How long will this take?” You want to give a perfect answer, but you only know half the requirements. You start researching, hoping to find certainty. Hours pass. The team is waiting. You finally blurt out a number, secretly hoping you’re not wildly off.

The Engineer's Dilemma: Engineers are trained to solve problems optimally, but estimation is about managing uncertainty and making decisions with incomplete information. The best estimators are not the most precise, but the most adaptable.

After working with years of agile development, I’ve learned the core issue isn’t mathematical—it’s psychological. Engineers crave certainty, but estimation is about making the best call with what you know now, and adjusting as you learn.


Common Engineering Estimation Anti-Patterns

Story:
I once worked with a team that spent two days researching a new API before estimating a single story. Another team always estimated for “perfect code,” then missed deadlines when reality hit. And almost every team I’ve seen forgets to add time for meetings, reviews, or that one teammate who always has questions.

Anti-PatternWhy Engineers Do ThisBetter Approach
Over-researching before estimationDesire for certainty and technical depthTime-box research to 30 minutes, create spike stories for unknowns
Estimating perfect code solutionsPride in clean implementationEstimate for 'good enough' MVP, schedule refactoring separately
Ignoring context switching overheadFocus on pure coding timeAdd 25-40% buffer for meetings, reviews, and interruptions
Assuming ideal conditionsOptimism biasAccount for support, bugs, and team dependencies

Story Points vs Hours: The Engineer's Perspective

Story:
Early in my career, I insisted on estimating everything in hours. “I’ll just add up the time for each task,” I thought. But I was always wrong—sometimes by a little, sometimes by a lot. Then I joined a team that used story points. At first, it felt imprecise. But over time, I realized it was about relative complexity, not time. And suddenly, our sprint planning got a lot less stressful.

1
Config change
One-line fix
CSS tweak
2
Simple function
Basic validation
Unit test addition
3
CRUD endpoint
Form component
DB migration
5
Complex algorithm
Integration work
Performance optimization
8
New service
Major refactoring
Complex feature
13
Needs breakdown
Epic-level work
High uncertainty

Memorable Tip: Story points are like T-shirt sizes—focus on fit, not exact measurements. If you’re debating between two sizes, pick the larger and break it down later.


Converting Your Engineering Intuition

Story:
A teammate once said, “This API will take me 2 days.” I asked, “What’s it similar to?” He replied, “The OTP authentication from last sprint.” That’s when it clicked: estimation isn’t about guessing hours—it’s about comparing to what you’ve done before.

// Instead of: "This API endpoint will take me 2 days"
// Think: "This is similar complexity to the OTP authentication we did last sprint"

// Story Point Calibration:
1 point = Hotfix or config change
2 points = Single responsibility function
3 points = Standard feature (follows patterns)
5 points = Complex feature or new pattern
8 points = Cross-system integration or major refactor
13 points = Too big—break it down!

The Hidden Complexity Engineers Miss

Story:
We once estimated a “simple” feature—just add a field to a form. But it touched legacy code, required major library updates as devops library were no longer supported, so no new deployments were possible. The “quick win” became a two-week saga.

For Every Story, Ask Yourself:

  • Does this touch legacy code require library updates?
  • Will this require database schema changes?
  • Are there performance implications at scale?
  • Does this need to work across multiple environments?
  • Will this change affect existing APIs or integrations?
  • Is there sufficient observability (logging, metrics, alerts)?
  • Does this require coordination with other teams?
  • Are there security considerations (auth, data privacy)?
  • Will this work with our current CI/CD pipeline?
  • Is there adequate error handling and rollback strategy?

Memorable Tip: If you answer "yes" to more than three checklist items, double your estimate or break the story down.


Complexity Multipliers by Code Category

Story:
Not all code is created equal. Adding a feature to a greenfield project is nothing like changing a legacy system or integrating with a third-party API. I’ve seen teams treat them the same—and pay the price.

Code TypeBase ComplexityCommon MultipliersFinal Estimate
Greenfield feature3 points+1 for tests, +1 for docs5 points
Legacy system modification3 points+2 for reverse engineering, +2 for testing edge cases8 points
Third-party integration5 points+3 for API limitations, +2 for error handling13 points (break down!)
Performance optimization5 points+2 for profiling, +1 for benchmarking8 points

Technical Debt and Legacy Code Estimation

Story:
I once inherited a codebase where every change felt like defusing a bomb. No docs, no tests, and the original authors were long gone. We started scoring technical debt separately—and our estimates finally matched reality.

Technical Debt Reality: 40% of engineering time is spent working with legacy code, yet most teams don't factor this into estimation. Always score technical debt separately and add a buffer.

Legacy Code Complexity Assessment

MetricScore 1Score 5
Code QualityWell-tested, documented, follows patternsNo tests, unclear business logic, multiple code smells
DocumentationComprehensive docs and commentsNo documentation, requires archaeology
Test Coverage>80% coverage with good testsNo tests, manual testing only
Team KnowledgeMultiple team members understand codeOnly one person (or nobody) knows how it works
// Technical Debt Estimation Formula
Base Story Points × (1 + (Legacy Score / 10))

Examples:
- Simple feature in well-maintained code: 3 × (1 + 0.5) = 4.55 points
- Simple feature in legacy system: 3 × (1 + 1.8) = 8.48 points
- Complex feature in legacy system: 8 × (1 + 2.0) = 24Break down!

Legacy Score = (Code Quality + Documentation + Test Coverage + Knowledge) / 4

Technical Debt Story Types

Story:
A team I coached started tracking “debt stories” as first-class citizens. Suddenly, refactoring, adding tests, and performance work were visible—and estimable.

Debt TypeEstimation ApproachTypical Range
Refactoring for maintainabilityOriginal implementation × 0.5-0.85-13 points
Adding tests to untested code1 point per 100 lines + complexity3-8 points
Performance improvementProfiling (2-3) + Fix (3-8) + Verification (1-2)8-13 points
Security vulnerability fixResearch (2-5) + Fix (2-8) + Testing (2-5)8-13+ points

Practical Estimation Frameworks for Daily Use

The 3-Layer Estimation Approach

Story:
When we started breaking stories into “core implementation,” “integration,” and “production readiness,” our estimates got more accurate—and our releases got smoother.

1
Core Implementation

Pure feature development without edge cases

2
Integration Layer

APIs, database changes, external dependencies

3
Production Readiness

Testing, monitoring, documentation, deployment

Example: Story: "User can upload profile pictures" Layer 1: File upload UI + basic storage (3 points) Layer 2: Image resizing, CDN integration (3 points) Layer 3: Error handling, tests (2 points) Total: 8 points


Spike Story Criteria for Engineers

Story:
We once spent a week spinning our wheels on a new technology. If only we’d called it a spike, set a time box, and moved on! Now, we use a checklist to decide when to spike.

Spike Story Triggers: - Technology you haven’t used before (new framework, library, service) - Performance requirements without baseline metrics - Integration with undocumented or poorly documented APIs - Architecture decisions affecting multiple components - Uncertain data migration complexity - Security requirements without clear implementation path

Spike Story Checklist

  • Technology is new or unfamiliar
  • Performance requirements are unclear
  • Integration with unknown APIs
  • Major architectural decisions
  • Uncertain data migration
  • Security requirements are ambiguous
# Spike Story Template

Title: "Research [Technology/Approach] for [Feature]"
Acceptance Criteria:

- Document 2-3 implementation approaches with pros/cons
- Identify major technical risks and mitigation strategies
- Provide complexity estimate for actual implementation
- Time-box to 1-2 days maximum

Example: "Research WebRTC implementation for video calling feature"

- Evaluate solutions: PeerJS, Socket.io, native WebRTC
- Test with 2-4 concurrent connections
- Document browser compatibility and mobile considerations
- Estimate complexity for MVP implementation

DevOps and CI/CD Considerations

Story:
A “simple” feature once ballooned because we forgot to estimate the database migration and new monitoring dashboards. Now, we always ask: what’s needed to get this into production?

Deployment Complexity Matrix

Deployment ScenarioAdditional PointsKey Considerations
Standard web app deployment+0 to +1Existing CI/CD pipeline handles it
Database migration required+1 to +3Data volume, downtime requirements, rollback strategy
New service deployment+2 to +5Infrastructure provisioning, monitoring setup, service discovery
Multi-service coordination+3 to +8Feature flags, gradual rollout, service dependencies
Breaking API changes+3 to +8Backward compatibility, client coordination, versioning strategy

Infrastructure as Code Estimation

Story:
We once underestimated a “simple” config change—until it had to be rolled out to five environments, each with its own quirks. Now, we factor in every environment and compliance requirement.

Task TypePoint RangeDescription
Configuration Changes1-2 pointsEnvironment variables, feature flags, basic config updates
New Service Provisioning3-5 pointsTerraform modules, network policies, security groups
Pipeline Optimization2-5 pointsParallel jobs, caching strategies, test optimization
Monitoring Integration3-8 pointsMetric collection, alert thresholds, dashboard creation
// Infrastructure Estimation Formula
Base Points × Environment Factor + Compliance Factor

Where:
- Environment Factor = 1 + (0.2 × number of environments)
- Compliance Factor = 2 if HIPAA/GDPR required

Example:
- New database cluster needing PCI compliance
- Base (3) × (1 + 0.4) + 2 = 3 × 1.4 + 2 = 6.28 points

CI/CD Pipeline Impact Assessment

Story:
A team I worked with added a new test type to the pipeline—then watched build times double. Now, we always estimate the impact of pipeline changes.

Pipeline ChangeEstimation ImpactEngineering Considerations
Add new test type+1-2 pointsFlakiness risk, runtime increase, parallelization
Security scanning+2-3 pointsFalse positives, triage time, baseline exceptions
Multi-stage deployments+3-5 pointsRollback strategies, verification checks
Artifact management+2-4 pointsVersioning, storage costs, cleanup policies

Handling Uncertainty and Spikes

Story:
If you ever find yourself saying, “I just need to look up…” more than three times, you’re in spike territory. Don’t guess—research, document, and move forward.

Rule of Thumb: If you say "I just need to look up..." more than three times, convert it to a spike story. Research is work—track it!

Spike Story Template

Title: [Technology/Problem] Research Spike
Goal: Answer [specific technical question] to enable estimation
Deliverables:

- Proof of concept code snippet
- Architecture diagram options
- Risk assessment matrix
- Recommended implementation approach
  Time Box: 4-16 hours (max 2 days)
  Acceptance Criteria:
- Team can estimate implementation stories
- Major technical risks identified

Uncertainty Assessment Matrix

Story:
One team doubled their estimates for “unknown unknowns”—and finally started hitting their sprint goals. Another built prototypes for every new technology. Both improved their accuracy.

Uncertainty LevelEstimation ApproachEngineer Action
Known UnknownsAdd 20% bufferDocument assumptions
Unknown UnknownsRequires spikeCreate research story
Complex DependenciesDouble estimatesMap dependency graph
New Technology3× initial estimateBuild prototype

Team Velocity and Performance Metrics

Story:
I’ve seen managers weaponize velocity, comparing teams and individuals. It never ends well. Velocity is a planning tool, not a performance metric.

Engineer Warning: Velocity is a planning tool, not a performance metric. Never compare velocities between teams or use it for individual evaluations.

Calculating Engineering Velocity

// Rolling Velocity Formula
velocity = (Σ last 3 sprints points) / 3

// Context Factors
adjusted_velocity = velocity × context_factor

// Common Context Factors:
- 0.8: Key team member on vacation
- 0.7: Major production incident
- 1.2: Working on familiar codebase
- 0.6: New technology stack

Engineering Metrics That Matter

Story:
A team I led started tracking cycle time and PR size. Suddenly, our reviews sped up, bugs dropped, and our estimates got sharper.

Core Metrics

  • Cycle Time < 4 days (commit → production)
  • PR Size < 400 lines (smaller changes = faster reviews)
  • Defect Rate < 5% (bugs in first 24h post-deploy)

Tooling and Automation for Better Estimates

Story:
We built a CLI tool that searched past tickets, calculated complexity, and suggested story point ranges. It didn’t make us perfect—but it made us better, faster.

Tool TypeExamplesEngineering Benefit
Complexity AnalysisSonarQube, CodeClimateIdentify high-risk code areas pre-estimation
Historical DataJira, Linear, ShortcutCompare similar past stories
CI AnalyticsBuildkite, GitHub ActionsTrack pipeline bottlenecks
Architecture DiagramsMermaid, Diagrams.netVisualize system interactions

Pro Tip: Build an "Estimation Helper" CLI tool that: - Searches past similar tickets - Calculates complexity scores - Generates checklist from code analysis - Suggests story point ranges


Team Dynamics and Continuous Improvement

Story:
The best estimation meetings I’ve seen are safe spaces. People admit what they don’t know, challenge each other’s assumptions, and learn from every “bad” estimate.

Psychological Safety: The best teams create a safe space for honest estimation. Encourage everyone to speak up if they’re unsure, and never punish “bad” estimates—use them as learning opportunities.

  • Use planning poker or silent estimation to avoid anchoring bias.
  • When estimates differ, discuss assumptions and risks, not just numbers.
  • Regularly review estimation accuracy in retrospectives and adjust your process.

Conclusion: Dos and Don’ts of Software Estimation

Story:
If you only remember one thing: estimation is a team sport. The best teams break work into small chunks, surface hidden complexity, and treat every “miss” as a chance to learn.

Estimation Dos

  • Break work into the smallest testable chunks
  • Use checklists to surface hidden complexity
  • Time-box research and create spike stories for unknowns
  • Add buffers for meetings, reviews, and context switching
  • Review estimation accuracy regularly and adjust
  • Encourage open discussion and psychological safety
  • Document patterns and share knowledge across the team

Estimation Don’ts

  • Don’t estimate in hours—use relative complexity (story points)
  • Don’t ignore technical debt or legacy code
  • Don’t assume ideal conditions—plan for interruptions
  • Don’t compare velocities between teams or individuals
  • Don’t punish “bad” estimates—use them to improve
  • Don’t skip retrospectives on estimation accuracy

Final Insight: Estimation is a journey, not a destination. The best teams treat it as an engineering discipline—measured, reviewed, and improved over time.