As more tools are available to businesses to make data driven decisions, many people are tempted to invent their own metrics or KPIs to measure success and progress. This is especially true in the case of software companies, where many leaders have an engineering mindset and are interested in coming up with metrics of their own.
While many people are aware of fallacies to watch out for when they’re analyzing data, few are paying attention to them when they’re designing new metrics to guide their teams. Here are 3 I’ve learned to watch out for.
Wikipedia defines survivorship bias as “the logic error of concentrating on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility”. The canonical example is from WWII — looking at the damages from warplanes that flew back to the base might falsely conclude that the heaviest damaged parts of the planes are the areas that require fortification.
In software engineering, we might be tempted to measure the number of bugs reported by customers alone as an indication of the quality of the product. However, there’s a survivorship bias built into this. Only customers who were not frustrated by the initial experience and have found sufficient value from the product would raise bugs. Many users may choose to abandon whatever workflows they have in mind, or worse, switch to another product before they would create a bug report. The areas of the product that have the most bugs might actually not be the areas that require the most focus on.
According to Wikipedia, the “cobra effect occurs when incentives designed to solve a problem end up rewarding people for making it worse.”
If engineering leaders define metrics to track the performance of their team without the appropriate counterbalancing metrics, they can easily introduce the cobra effect to the teams.
A common mistake is to measure the performance of individual engineers by how many bugs he/she fixed. What would be worse is to reward the individual that fixed the most bugs. While it has a potential short term effect of boosting the number of bugs fixed in a particular release, it has a detrimental effect on the overall quality of the product in the long term. A much better approach would be to define a metric based on ratio or include a counterbalancing metric. For example, in addition to measuring the number of bugs fixed by an individual, also measure the number of bugs associated with the features the person has worked on. Features can be normalized by the number of story points or other complexity measures, to make sure individuals are not penalized by working on complex features.
Another common example is when engineering leaders ask for 100% accurate project estimates. This results in engineers inflating their estimates, and over the long term everyone’s ability to estimate accurately actually decreases. A simple fix can be borrowed from literature on setting OKR goals — instead of aiming for 100% which doesn’t allow for stretch goals and healthy challenges, aim for 80%.
The McNamara fallacy is relatively less studied, but it’s one that deserves attention here. Robert McNamara was the US Secretary of Defense during the Vietnam War, and he believed decisions should be based solely on quantitative observations (or metrics) and ignoring all others. He basically advocated the following:
- Measure whatever can be easily measured.
- Disregard that which cannot be measured easily.
- Presume that which cannot be measured easily is not important.
- Presume that which cannot be measured easily does not exist.
In the context of the Vietnam War, he measured the progress of the war by the number of casualties on both sides alone without considering the terrain, territories gained or sentiments of the locals.
While engineering leaders rarely consciously presume things that cannot be measured easily do not exist, they often focus only on things that are easily measured. Combining that with a keen interest with making all decisions data driven, we end up being victims of the McNamara fallacy.
For example, understanding engineering productivity is hard. It might be tempting to measure the number of features developed and the number of months they took to determine whether the engineering team is high performing. However, there are other important pieces that are much more difficult to measure — e.g. whether the resulting architecture is sustainable, or whether it is easy to maintain and extend. A victim of McNamara fallacy in this case would keep slapping together systems that are fast to build initially but slowed to a crawl almost immediately.
Knowing these fallacies alone will not help you to write effective metrics, but devising your own metrics without paying attention to them will most certainly backfire and end up hurting your team.