How to set baseline story points for a new team?
I'm starting with a brand new development team next week. We're all experienced developers but none of us have worked together before on this codebase.
We want to use planning poker with Fibonacci sequence (1, 2, 3, 5, 8, 13, 21). But how do we establish what those numbers actually mean when we have no shared context or reference points?
Do we just pick an arbitrary story and call it a 5, then estimate everything relative to that? Or is there a better way to calibrate the team from day one?
The best way to establish a baseline is to use recently completed work. Here's the process I've used with dozens of new teams:
Step 1: Gather Completed Stories
Take 3-5 user stories that were recently finished (even if by a previous team or in a prototype phase). You want actual completed work, not hypotheticals.
Step 2: Retrospective Estimation
Use planning poker to estimate these completed stories. Everyone votes based on what they know NOW about the complexity. This is easier because:
- You can look at the actual code
- You know what challenges came up
- There's no uncertainty about requirements
Step 3: Find Your Baseline Story
The smallest story becomes your baseline. Usually a 1, 2, or 3 depending on your scale preference. This should be something simple like:
- "Add email validation to a form field"
- "Update button text and color"
- "Add a new column to a report"
Step 4: Create Reference Stories
From your completed stories, pick examples for different sizes:
- 1-2 points: "Email validation" (30 min - 2 hours)
- 3-5 points: "User profile page" (half day - full day)
- 8 points: "Payment integration" (2-3 days)
- 13 points: "Complete search feature" (3-5 days)
Step 5: Document and Display
Write these reference stories on your team's wall or wiki. During planning poker, people can refer back to them: "Is this more or less complex than the email validation story?"
Step 6: Recalibrate
Every 2-3 sprints, review your reference stories. As the team learns the codebase, what felt like an 8 might now be a 5. Update your baselines accordingly.
This approach gives you concrete anchors instead of abstract numbers. Way better than guessing blindly on day one.
We did something similar but used T-shirt sizes first (S, M, L, XL) then mapped them to numbers after a sprint. Less pressure on getting the numbers "right" from the beginning.
After sprint 1, we converted: S=2, M=5, L=8, XL=13. Worked really well for us.
This is exactly what I needed! We do have some completed stories from the prototype phase that we can use as anchors.
I'll run a calibration session in our first sprint planning to establish the baseline. Thanks for the detailed breakdown!
One warning: Don't over-think the baseline. Your first sprint estimates will be rough no matter what. Just pick something reasonable and adjust after you have real velocity data.
I've seen teams spend 4 hours debating what a story point means. Just start, measure, adjust. The baseline will find itself.
Also consider creating a "reference story catalog" in Confluence or Notion. Keep it updated as you complete sprints. New team members can read through it and immediately understand your team's estimation culture.
We have about 20 reference stories covering different complexity types: database work, UI work, API integrations, etc.