What do numerical software development estimates actually mean?
What do they mean for you?
(I’m talking especially about team-based estimation, such as that in ‘planning poker‘, but I’m guessing whatever conclusions we may have would hold for other methodologies.)
I see the general objective here as coming up with a number for each task, and a number for how much your team can typically do in an amount of time, such that these numbers are reasonably fungible.
Traditionally, estimates would be given in ‘programmer days’, or ‘wall clock time’, depending on whether you had read ‘The Mythical Man-Month‘ or not[1].
More recently, there has been a back-and-forth between ‘amounts of time’ and some sort of dimensionless unit called ‘complexity points’.
Various teams that I had been a part of struggled with ‘complexity points’. In their strictest definition, something which was simple and repetitive would be worth few ‘complexity points’, even though it would take many hours of some attention or nursemaiding to finish the task.
Strict ‘amounts of time’ are no better, because each person does each task at a different rate.
We had the most success with ‘relative complexity’, or taking some small and large tasks, assigning them numbers, then rating each of the other tasks with respect to these goalposts.
Even this has its limitiations, though. Fundamentally, they question you’re asking when you’re deciding to put something into a sprint is ‘can we still accomplish everything if we include this?’. Because of limiting reagents (specific people who are bottlenecks for many tasks) and interdependence between tasks, this can be problematic. The standard way of getting around this is to insist that all tasks are independent and small.
This worked reasonably well, it’s just that sometimes you need to rewrite or refactor an entire application.
What are your experiences? How have you dealt with this question? How many points would it be worth to research and present on this topic?
(This post came out of a fb conversation with D about what estimation numbers mean, and have meant at various times.)
[1]One of the upshots of this is an observation made by someone at work (I think F) which was that Gantt Charts are excellent for deriving dependencies, but terrible for estimation.
We think that the following tasks are important, but we may be wrong. We do not know how long they should take. Although you also do not, honestly, know how long they will take either, we would like you to commit to your guess as to how long that is. We won’t hold you to that time period. We will, however, construct a measurable framework around how long you guessed the tasks would take and how long those tasks actually took. We’re not judging you, and if your estimates prove incorrect, no one will be blamed at all in any of the meetings you’re invited to. If it turns out that these were not the important tasks, well, shh.