Why and how to choose reference user stories

Create a shared understanding of complexity.
A book

For many agile software engineering teams, it is common practice to estimate the complexity of the user stories that they will be working on. Estimation is not an exact science and so they choose t-shirt sizes, story points, or other non-time based scales to account for the uncertainty that is inherent to software development. Over time, a team of individual software developers gradually comes to an implicitly shared understanding of what degree of complexity each value on their chosen scale represents.

This implicitly shared understanding can easily be challenged, though, when a new joiner to the engineering team or colleagues from different disciplines (e.g. product management, product design, …) pose the supposedly simple question: “What do 5 story points mean in your team?” For the remainder of this blog post, we will use story points when describing complexity. However, what’s being said can be transferred to other complexity scales, too. Translating the implicit agreement into words is, in fact, quite challenging. Most likely, every developer on the team will phrase her or his answer differently, especially in a cross-functional team.

One approach that we at Babbel take to make the shared understanding explicit is to choose reference user stories. While we acknowledge that every new piece of software will at least be in part “terra incognito” for the engineering team and that no user story is like the other, it nevertheless helps to refer back to an agreed-on reference when estimating new stories. By discussing similarities and differences between a new user story and reference stories, the engineering team engages in relative estimation. It is less likely to fall into the trap of indirect, absolute estimation.

One of our teams recently faced the necessity of making the implicit explicit. It was a rather young team of three software engineers, who had been working together for less than three months. When two new joiners arrived, we accepted the challenge of choosing appropriate reference user stories. We decided to play the Team Estimation Game, originally developed by Steve Bockmann, with slight modifications to pick suitable stories. As this worked quite successfully for us, we are sharing the modified instructions here on Babbel Bytes with you. The game (not counting its preparation) lasted a little less than 1.5 hours.


  • Look back at the work your team has done over the last 4-6 weeks. Don’t go too far back, because it should still be relatively fresh in memory for everyone.
  • Select 20 – 25 estimated and completed user stories that, on a quick glance, appear to be good candidates. When in doubt, select more stories rather than fewer, as not all of them will make it into the final selection.
  • Try to make this preselection well balanced. For example, choose a wide range of complexity based on the original estimates. Also, try to create a mixture of both frontend-heavy and backend-heavy stories.
  • Write the stories on index cards (one story per card) but do not to include the original estimates.
  • Clear a table and make enough room for all team members to stand or sit around it.
  • Place a sticky note labeled “smallest” on the far left side of the table. Place another sticky labeled “biggest” on the far right side of the table.
  • Place all user story cards face down in a pile somewhere on the table.
  • Keep one set of planning poker cards ready for the second part of the game. Our set is labeled with the Fibonacci numbers from 1 to 21.

First Part

Ask your team members not to consider the estimates they originally put on these user stories before implementation. Rather, they shall act with the knowledge they have now. That is, ask them to consider the actual complexity they experienced when they actually implemented the stories.

Players take turns:

  • One team member starts the game by picking the first user story from the pile, reading it out loud, and placing it in the middle of the table.
  • The next team member pulls the second story from the pile, reads it out loud, and places it relative to the first. They place it to the left of the first if they deem the second story smaller or equal in terms of complexity than the first. They place it to the right if they deem it bigger.
  • From now on, on each team member’s turn, she/he can either
    • pull another story from the pile and place it on the table relative to the ones laid out already,
    • change the position of one story on the table by moving it, or
    • pass.

On every turn, the players are asked to explain their placement of a story. The first part of the game ends when there are no stories left on the pile and all team members pass in the same round. In other words, the game ends when they have reached a consensus about the stories’ levels of complexity in relation to one another.

User stories ordered by their complexity

Second Part

Provide the players with the planning poker cards.

Players take turns again:

  • The player to start the second part of the game is asked to look at the smallest user story on the very left side of the table and assess whether stories of this complexity are likely to be the smallest the engineering team will encounter while working on the project. If so, the player places the smallest numbered poker card, the “1”, above the smallest story. If not, she or he can choose to place a higher numbered card, instead.
  • The second player searches for a story above which to place the next highest poker card. For example, if the previous player placed the “1”, the second player is asked to place the “2”, where she or he thinks the stories start to be twice as complex.
  • The game continues by players placing the steadily increasing planning poker cards wherever they feel a complexity break occurs. A complexity break occurs when the player considers a story to be notably more complex than the story below the last poker card to the left. Please note that with Fibonacci numbered poker cards, the gaps grow with each card to account for the increasing uncertainty in estimation. For example, the difference in complexity between a 5 point story and an 8 point story is supposed to be much smaller than the difference between a 13 point story and a 21 point story.
  • Instead of placing a new poker card, players may use their turn to move a story card or a poker card (i.e., change the assignment of complexity) or they may decide to pass.
  • It might happen that a poker card cannot be placed above a story because no user story of that size exists in the preselected set of stories. Leave a gap for this poker card and continue with the next highest.

The second part of the game ends when all poker cards have been placed and all players decide to pass in one round. They have reached a consensus about the assignment of complexity values to their user stories.

User stories with assigned complexity levels

Third Part

The objective of the third and final part of the game is to select your reference user stories.

To achieve this, discard any stories that “don’t feel right”. As you have paid close attention to the discussions that your team has led during the first two parts, identifying them should be rather easy. You probably have listened to heated debates about some particular stories. You have experienced cards moving back and forth and back again. Statements like the following are indicators that a story would not serve well as a reference:

  • “This story is somewhere between a ‘5’ and an ‘8’.”
  • “This story is definitely bigger than an ‘8’, but ‘13’ sounds too much.”

Remember, the stories that you keep in your final selection are meant to be references for the respective degree of complexity they represent. From now on, when your team estimates new stories, they will refer back to these stories. The reference stories will also help new joiners to your team during their onboarding and they will facilitate discussions with other colleagues.

That’s why you want to boil down the superset of preselected stories to a smaller set of stories that you have a firm agreement from the entire team on.

In case your final selection is very small due to this elimination, we suggest to simply repeat the game at a later time with a fresh batch of stories. You can always add stories to your set of references.

If, however, you happen to have multiple suitable candidates per complexity level, we recommend to choose more than one user story.

Final selection of reference user stories

In our team, we have documented the reference stories that we picked in a place that is easily accessible not only for us engineers but also for our colleagues, with whom we work in close collaboration. We pull the document up in every planning poker session and occassionally in discussions with product management.

From the first planning poker session onwards, we experienced an increase in confidence when estimating new stories for two reasons: First, we rely less on gut feelings because we can refer back to previously completed stories in our technical discussions to point out similarities or differences. Second and maybe more importantly, we no longer rely on an implicitly shared understanding of complexity. By making this agreement explicit, we were able to remove the doubt whether we truly share the same understanding that kept nagging us.

Photo by Mark Duffel on Unsplash

Want to join our Engineering team?
Apply today!