Curious Productivity

The problem with metrics: Goodhart’s Law and setting better goals

Metrics are critical for success because they enable us to measure progress toward our goals. But what happens when we confuse metrics and goals? Goodhart’s Law provides an answer. It says “when a measure becomes a target, it ceases to be a good measure.”

Goodhart’s Law in action

The reason that metrics are powerful is that they create strong behavioral incentives. But if we’re not careful they can undermine our efforts and produce unintended outcomes.

The classic and probably apocryphal example of Goodhart’s Law involves nail factories in the Soviet Union.

The Soviet government set production quotas based on the number of nails produced. The workers responded by producing hundreds of thousands of tiny nails. Moscow changed to quotas: they were now based on the weight of nails produced. So the workers made a few massive nails. In both cases, the nails were useless.

For a more contemporary example, look no further than the American public education system. The goal is for students to learn, and most school systems measure learning using test scores. As a result, many teachers spend more time instructing students on how to take multiple-choice tests (e.g. solving a math problem by plugging in each of the multiple-choice answers) than they do teaching the underlying concepts being assessed.

goodhart's law

Critical thinking, curiosity, and the arts are hard to measure and therefore have less practical value in the American education system. They are often the first things sacrificed at schools keen to raise test scores (I’m not suggesting that teachers want to do this; they’re responding to the rules that are set for them).

Here are more examples of Goodhart’s Law:

Software Company

A company sets a goal to release software with as few bugs as possible. Engineers are incentivized to work on easy problems because they are less likely to result in bugs. Over time, complex problems go unsolved, which results in the unintended consequence of more bugs in the long run.

Promotions

A company wants to promote the workers with the highest performance ratings. Workers figure out how to game performance ratings and prioritize the activities that will maximize their ratings.

Graduation rates

A school district wants to increase high school graduation rates. Administrators and teachers focus more on boosting the graduation rate than teaching the skills and knowledge students need to be successful after graduation. Graduation rates are no longer measuring educational attainment because the metric has become the goal.

Why does this happen?

“Tell me how you will measure me, and then I will tell you how I will behave”

Eli Goldratt

Goodhart’s Law is connected to the upsides and the risks of using metrics. Let’s talk about the upsides first.

Over time, humans have become aware that we’re easily fooled. We’ve learned we can’t always trust what we see, what we feel, and even each other. We’ve learned that complexity slows us down. And we’ve learned how difficult it is to coordinate the actions of large groups of people.

Measurement seems like the silver bullet that solves all of these problems.

goodhart's law

As David Manheim writes:

“Measurement replaces intuition, which is often fallible. It replaces trust, which is often misplaced. It finesses complexity, which is frequently irreducible. So faulty intuition, untrusted partners, and complex systems can be understood via intuitive, trustworthy, simple metrics. If this seems reductive, it’s worth noting how successful the strategy has been, historically. Wherever and whenever metrics proliferated, overall, the world seems to have improved.”

But it also has significant downsides.

Why is measurement hard?

Measuring is not natural. Animals don’t do it; children must be taught to do it. But measurement is pervasive, and we have become so accustomed to doing it that we take it for granted. But it comes with its own problems:

  1. We over-measure
  2. We’re too quick to trust data
  3. It can obscure complexity
We over-measure

We humans are fallible, but this doesn’t mean that we should overcorrect by ignoring our intuition. Intuition is the product of millions of years of evolution. It’s an invaluable source of information and is associated with our creativity and innovation. To dismiss it out of hand is a mistake; we can be rational and intuitive.

We should remember:

  • Some decisions are best made intuitively
  • We don’t always need to measure
  • Just because something is hard to measure doesn’t mean you shouldn’t do it, or that it can’t be effective.

To this last point, I think that many companies today overvalue what they can measure and undervalue what they can’t.

Consider the case of employee engagement, which Gallup defines as the “involvement and enthusiasm of employees in their work and workplace.” In 2022, Gallup found that 32% of employees are engaged, while 17% are actively disengaged. I think that one of the reasons employee engagement is broadly lacking is that it’s difficult to measure and therefore difficult to convince decision-makers to dedicate resources to improving it.

I don’t mean to oversimplify. The decision of when to measure vs. when not to is complicated and requires assessing the value of information. The bottom line is that metrics should not be used dogmatically.

We trust but don’t verify

Using metrics requires trusting (1) the data that the metrics are based on, and (2) the methods used for collecting and processing the data.

goodhart's law

As any researcher will tell you, especially in the social sciences, collecting and applying data is very difficult. It is often much less precise than imagined.

There are two pitfalls here:

  1. We use metrics without verifying them. It’s the equivalent of building a house on a weak foundation.
  2. There are often incentives to lie or to fudge the numbers that the metrics are based on, often in ways that can be impossible to detect
  3. We tend towards using easy-verify metrics, which may or may not be useful.

A good example of this is in the above-mentioned education examples: graduation rates and test scores are easy to measure and verify. But the extent to which they reflect actual learning, which is far more difficult to measure and verify, is questionable.

Metrics obscure complexity

One of the great things about metrics is that they seem to reduce complex systems to single, digestible numbers. The problem is that a metric can never substitute for a deep and holistic understanding of a system.

Imagine that you’re the leader of a large sales organization that is underperforming. You need to understand what is going on. However, getting into the weeds and truly understanding a system takes a lot of time and energy. It’s much easier and faster for you to consult the dashboard that the data analytics team made for you and draw your primary conclusions from that.

But metrics don’t actually reduce complex systems, they just summarize them. They can hide problems that, gone unnoticed, create more problems.

Avoiding Goodhart’s Law

What do we do to avoid Goodhart’s Law when using metrics? To an extent, we are damned if we do, damned if we don’t. But it is possible to moderate their downsides.

Fully operationalize goals

Operationalizing a goal means defining it so that is “distinguishable, measurable, and understandable.” Simply put, your goal must be divided up into pieces that individuals can contribute to.

Failing to sufficiently define a goal leads to what David Manheim calls “the soft bias of underspecified goals.” In the absence of clear goals, people will simply optimize for the available metrics, which significantly increases Goodhart’s Law risks. Manheim writes:

If you don’t have clear goal, the target you’re aiming for is implicitly underspecified, and could lead to many outcomes. That means your metrics don’t particularly align agents, and you’re very susceptible to Goodhart’s law pulling you towards a system that aligns more with the agents interest than the interest of the regulator… the key point I want to make is that without specific and well-specified goals, metrics take over.

I previously wrote about better defining goals here.

Have a strong sense of mission

But hold on: avoiding the soft bias of underspecified goals doesn’t mean we should overcorrect by relying solely on well-defined metrics. This can create, again in David Manheim’s words, “overpowered metrics” that distort behavior and warp the system.

The goal is balance, which is achieved through an alignment of mission, goals, and strategy. James Q. Wilson writes, “the great advantage of mission is that… operators will act… in ways that the head would have acted had he or she been in their shoes.” In other words, when people in our organization are invested in your mission, they will feel a sense of ownership and take action to advance the mission accordingly.

Mission can compensate for metrics. I think this explains how organizations with unclear metrics and chaotic environments can succeed. Ideally, though, the mission complements metrics and helps to counterbalance their potential downsides.

Never use a single number

It’s tempting to create only one metric. Sweet simplicity! But remember: people will optimize their behavior to maximize the measurements that you choose. Using one metric increases the chances of unintended consequences. You might achieve your key metric, but if success comes at the expense of other important objectives, then it may not be worth it.

We’re better off using sets of metrics to avoid Goodhart’s Law. A good way to create sets of metrics is to use “counter-metrics.” This concept comes from Julie Zhou, a former Product Design VP at Facebook:

For each success metric, come up with a good counter metric that would convince you that you’re not simply plugging one hole with another. (For example, a common counter metric for measuring an increase in production is also measuring the quality of each thing produced.)

Read more

  • Scott Garrabrant describes a taxonomy of 4 distinct “flavors” of Goodhart’s Law and strategies for mitigating each type. It’s more technical, but great if you want to get into the weeds
  • David Manheim’s pieces on Goodhart’s Law here and here are incisive and thought-provoking for anyone interested in going deeper on this topic

Want the latest from Case for Curiosity delivered directly to your inbox?
Subscribe for regular updates.