Story Points vs. Hours: Why the Distinction Matters
The most common question about story points is: "Why not just estimate in hours?" The answer comes down to what we're actually trying to measure.
Hour-based estimates try to answer: "How long will this take me, with my current skill level, in my current context, with no interruptions?" That question is nearly impossible to answer accurately. It depends on who is doing the work, what interruptions they face, how familiar they are with the codebase, and dozens of other factors that vary sprint to sprint.
Story points answer a different question: "How hard is this story relative to other stories our team has completed?" That question is far more answerable. The whole team — frontend, backend, QA, design — can agree that a story is "about as complex as the payment integration we did last sprint" without needing to predict exactly who will pick it up or what their calendar looks like.
The practical result is that story point estimates are more stable over time. When a team's velocity grows because they've become more efficient, the story point estimates don't change — velocity goes up instead. This keeps historical comparisons valid and makes sprint forecasting more reliable.
The Three Dimensions of a Story Point
A story point estimate should capture three things at once:
- Effort — how much work is involved? Touching five services is more effort than touching one, even if none of them is particularly complex.
- Complexity — how difficult is the problem to reason about? A straightforward CRUD feature is less complex than implementing a new caching strategy, even if both involve similar amounts of code.
- Uncertainty — how well do we understand the work? A story in a familiar part of the codebase with clear acceptance criteria has low uncertainty. A story that requires integrating with an undocumented third-party API has high uncertainty, and that should be reflected in the estimate.
This is why the ? card in planning poker exists — sometimes a story has so much uncertainty that any estimate would be misleading, and the right answer is to do a spike or clarify requirements before sizing.
How to Estimate Story Points in Practice
The most effective way to calibrate story point estimates across a team is to establish reference stories: past stories the whole team agrees on as benchmarks for specific point values.
For example: "A 1-point story is like updating a field label in the UI. A 3-point story is like adding a new filter to an existing API endpoint. An 8-point story is like building the password reset flow we completed in Sprint 4."
With anchors like these, estimation becomes comparison rather than prediction. The team asks "is this story more like the filter work (3 points) or more like the password reset (8 points)?" rather than trying to predict hours from scratch.
Planning poker — where every team member simultaneously reveals their estimate — is the standard technique for reaching agreement on story points without anchoring bias.
The Fibonacci Scale for Story Points
The Fibonacci sequence (1, 2, 3, 5, 8, 13, 21…) is the most widely used scale for story points for one important reason: the gaps get larger as the numbers grow. Choosing between 1 and 2 is a meaningful distinction. Choosing between 13 and 15 is not — our ability to estimate precisely degrades with task size, and the Fibonacci scale reflects that by not offering 14 or 15 as options.
Stories estimated at 13 or 21 are usually candidates for decomposition. Most teams treat any story above 8 as a signal to ask: "Can we split this into two or three smaller stories before committing to a sprint?"
Common Story Point Mistakes to Avoid
- Converting story points to hours. Story points and hours are different units measuring different things. Converting them — "5 points = 10 hours" — destroys the benefits of relative estimation and reintroduces all the problems of hour-based forecasting.
- Comparing velocity across teams. A team that averages 60 points per sprint is not more productive than a team that averages 30. They just calibrated their scale differently. Velocity is only meaningful within a single team over time.
- Estimating too granularly. The difference between a 5 and a 6 is not real. If your team is deliberating between adjacent numbers, the scale is too fine. Fibonacci exists precisely to prevent this.
- One person estimating for the team. Story points should reflect the collective understanding of everyone who will touch the work: developers, testers, and sometimes designers. Solo estimates miss the complexity that others would catch.