A predecessor to Amazon’s successful system of measuring input metrics was the attempted “fitness function.” The goal was to create a composite metric for business performance, and it was a disaster. Here’s why:
The fitness function was an idea formed by senior leadership, and it came alongside the creation of two-pizza teams—autonomous, single-threaded teams. The intent was to measure these newly independent teams using a well-intentioned, tailored metric. Each team would define about six key, controllable input metrics to be agreed upon by both the team and by Jeff and the S-Team.
For example, the search team metrics would include items like the percent of page views where the customer clicked on the first result, or the Top Percentile 90 page load time in milliseconds.
The good part of this model was that the metrics were hyper-specific to the team’s domain. The idea of the fitness function was to roll those into a single number by weighting each metric and creating an aggregate index. Leadership would be able to glance at a dashboard and know immediately how 30 teams were doing.
On paper, this seemed elegant. Progress visualized as a single number would be clean and simple.
However, in practice, this was not effective.
First, teams spent a lot of time debating the weights of each metric. These weights were arbitrary, and the resulting number often lacked meaning. Second, munging all the metrics into a single index obscured the details required to understand the complexity of each function or system.
You couldn’t tell which specific part of a function or business was working or not by looking at the line on a chart. If one metric went up and another went down, the blended score didn’t tell you which issue to solve.
This generalization washed out the value of the hyper-specific metrics that were being aggregated to build it.
The good news is that the pivot to a more effective system was easy. The team simply removed the aggregated number and continued individually tracking all the metrics that went into it. This model ultimately evolved into Amazon’s input metrics approach.
The lesson learned was that compound metrics hide the truth while granular metrics expose it. A single number might seem sophisticated, but it cannot guide decisions in a complex business. Today, when advising teams, we suggest tracking hundreds of metrics for each function, business, and operation. These must be metrics that are specific, controllable, and, in many cases, directly measure the customer’s experience.
What’s more, this example highlights something rare in company culture: a willingness to recognize when something isn’t working and fix it without abandoning the valuable parts. Many companies either defend a bad system too long or throw out everything, including the parts that were working.
The fitness function failed, but the way Amazon responded to it helped pave the way for something effective that is in use to this day.
