Author: Jerry Z. Muller
Publisher: Princeton University Press
Publishing Date: February 6, 2018
Pages: 240 pages (Hardcover)
“Any measure used for control is unreliable.” This is author Jerry Muller’s summary of Goodhart’s Law, and it captures the spirit of this excellent cautionary book. Muller’s book deals with a phenomenon he calls “metric fixation,” an all-too-common phenomenon today where the close association we often make between measurement and improvement leads us to substitute metrics for judgment; and all too often, we end up gaming the systems that we set up. For virtually anyone working in a management role, this book offers important cautionary tales.
How to Misuse Metrics
Muller spends most of his book laying out the contours of his concerns with metric fixation and demonstrating the dangers of the improper use of metrics in a wide variety of different public and private sector settings. Muller identifies the primary features of this misuse of metrics: 1) the desire to replace personal judgment with numerical metrics; 2) a belief that making metrics public will assure that institutions are doing their job; and 3) a belief that rewarding or penalizing these institutions (or their employees) based on these metrics will motivate further improvement.
The first feature can cause problems because not everything worth considering can be counted, and metrics tend to have an inherently reductionist effect that limits the scope of decision-making to far fewer factors than individual judgment. The second and third features work together to create incentives to game metrics by doing things like lowering standards, selecting only cases that make the institution look good, or even outright cheating or distorting data.
Muller demonstrates the reality of these dangers with an abundance of case study examples demonstrating the good, bad, and ugly when it comes to using metrics. We will address the good examples at the end of this review; however, the bad to ugly cases are more fun, so I will mention a few of them here. On the bad end are examples like the rising number of students seeking college education even though SAT/ACT scores would suggest that no more students are graduating from high school ready for college than in the past. Because colleges are ranked and monitored in part based on graduation rates, there is pressure to lower standards for achieving graduation, a fact that devalues college education and partly explains the growth of graduate program enrollment. Among the more ugly/tragic examples cited is a situation in the U.K. where a government pledge to treat all emergency patients within four hours led to a phenomenon of ambulances being kept from delivering patients until the hospital was sure it could hit the target.
So what explains the rise of metric fixation? Muller lays the blame at the feet of movements that brought norms of the business sphere to the public sector. He points to the importation of criteria from market economics to judge performance in education in England of the late 1800s, the movement toward “scientific management” under engineer Frederick Winslow Taylor in America, and Secretary of Defense Robert McNamara’s move toward “managerialism” during the Vietnam War.
Although Muller is right that these movements often sought to bring a business sensibility to other institutions, he does strip his narrative of some context, making this sound like more of a malign or ill-conceived enterprise than what it was. The reality is that the government was growing massively in complexity in the early 20th century, hugely increasing the scope of what was being managed. Further, nepotism, cronyism, and general corruption were at epidemic levels in many public institutions at the time. Part of the movement to favor metrics over judgment was driven by the poor use or overwhelming of personal judgment. Muller at times makes it sound like it was solely hubris that drove this movement, but his insight in chapter four is probably more apt: metrics are something favored when social trust is low. The piece missing from some of his historical analysis is that sometimes low social trust in a particular institution is earned.
How to Use Metrics
Muller is not entirely opposed to metrics, however. He includes several examples of the good use of metrics, and he endeavors to provide some guidelines for using metrics well. The “checklist” at the end is a little long and convoluted for my taste, but the simplest sense of how to use metrics is born out in the positive examples. Metrics work best when they are being used internally by a mix of professionals and managers in service of common goals, not as performance standards to mete out rewards and punishments.
Muller’s discussion of Compstat provides a perfect example of what this means and what consequences can come with crossing from one use to the other. Compstat is a comprehensive mapping of criminal activity in a city. Pioneered in New York in the 1990s, the program is credited with contributing to the significant drop in violent crime in that city. Used well, Compstat created a detailed set of data that police could use to pinpoint their response to areas that needed it most. Over time, however, rather than just guiding the deployment of resources, Compstat made many commanders push the use of metrics to lower crime numbers in their jurisdiction. Under this pressure, many commanders resorted to pushing quotas on their subordinates, and officers responded by skewing charges (to keep violent crime rates lower) or mechanically conducting things like stop and frisks with little regard for the legal requirements for doing so. In this way, an NYPD that stood as a model of proactive, evidence-based policing slipped into a culture that mistook hitting frisk quotas for doing their job, and they paid a high price for that error.
So what do we take away from this book? It would be a mistake to have a total aversion to statistics, metrics, or assessment—although the book can feel like it is encouraging that at times. At the same time, we should heed the author’s warning that transparent metrics and scorecards are rarely going to be effective substitutes for institutional trust. Almost counterintuitively, it uses metrics for internal purposes (often not made public) that seem most effective. When a team uses them in self-evaluation to better pursue less measurable ultimate goals (supplementing judgment born of real experience), they can be a powerful tool for pursuing excellence, hopefully inspiring the sort of public institutional trust that is so sorely lacking today. I highly recommend this book for anyone involved in the management of any public or private institution or to any students who are seeking training or entrance into a field like public administration. (Oh, and if there are any of those students out there, I can think of at least one good program to look into.)