The Measurement Twilight Zone

Measurement is everywhere. Our symbol for justice is a balance scale. We have technical standards for air, water, and food quality. Trade and commerce, from local to global markets, depend on quick and easy ways of knowing what and how much is for sale.

We all depend on measurement, but hardly anyone knows anything about how instruments are calibrated or how meaningful expressions of quantity are created and maintained.

So measurement exists in a kind of twilight zone between the clearest and most rigorous mathematics, on the one hand, and the darkest and most obscure ignorance, on the other. Take temperature, for instance. Virtually everyone over the age of five or so knows how to read a thermometer. But very few people can correctly describe the thermodynamic relationships that make a thermometer work.

We can rely on thermometer manufacturors to do the work of calibrating temperature measures for us. But what happens when we need to measure something for which there are no commercially available solutions?

As demand increases for measures of human and organizational performance, of social capital, and environmental impact, more and more managers, executives, entrepreneurs, accountants, philanthropists, and researchers unknowingly enter into the measurement twilight zone.

In the measurement twilight zone, things are not as they seem. Numbers add up the way they always do, but they no longer stand for constant amounts. We manage what we measure, and so we ask customers, employees, or patients to rate performances, we count right answers on tests, and we compute the percentage of time that some event happens.

But none of these numbers are measures. None of them add up. This is a very serious situation. It is not a rare, academic technicality of no practical consequence. Improving the quality of our measures is an urgent matter that ought to be the focus of a great deal more attention and interest than it currently is.

For instance, do you know that sometimes a 15% difference can stand for as much as or even a lot more than a 39% difference? Did you know that three markedly different percentage values–differences that vary by more than a standard error or even five– might actually stand for the same measured amount? Do you know that the difference between 1 percent and 2 percent can represent 4-8 times the difference between 49 percent and 50 percent?

Scores, ratings, and percentages are termed “ordinal” because, at best, they stand for a rank order of less and more. They do not stand for equal-interval amounts, though they can be a good start at creating real measures.

The general public doesn’t know much about all of this because the math is pretty intense, the software is hard to use, and we have an ingrained cultural prejudice that says all we have to do is come up with numbers of some kind, and–voila!– we have measurement. Nothing could be further from the truth.

My goal in all of this is to figure out how to put tools that work in the hands of the people who need them. You don’t need a PhD in thermodynamics to read a thermometer, so we ought to be able to calibrate similar instruments for other things we want to measure. And the way transparency and accountability demands are converging with economics and technology, I think the time is ripe for new ideas properly presented.

In my 25 years of experience in measurement, people often turn out to not understand what they think they understand. And they then also turn out to be amazed at what they learn when they take the trouble to put some time and care into crafting an instrument that really measures what they’re after.

For instance, did you know that there are mathematical ways of reducing data volume that not only involve no loss of information but that actually increase the amount of actionable value? We are swimming in seas of data that do not usually mean what we think they mean, so being able to ensure things add up properly at the same time we reduce the volume of numbers we have to deal with is an eminently practical aid to understanding and manageability.

Did you know that different sets of indicators or items can measure in a common metric? Or that a large bank of items can be adaptively administered, with the instrument individually tailored and customized for each respondent, organization, or situation, all without compromising the comparability of the measures?

These are highly practical things to be able to do. Markets live and die on shared product definitions and shared metrics. Innovation almost never happens as a result of one person’s efforts; it is almost always a result of activities coordinated through a network structured by a common language of reference standards. We are very far from having the markets and levels of innovation we need in large part because the quality of measurement in so many business applications is so poor.

And there’s lots more where that came from, but I’ll stop there. You can learn a lot more on these topics from a lot of sources. I’ll list a few below.

http://www.rasch.org
http://www.rasch.org/rmt
http://en.wikipedia.org/wiki/Rasch_model
http://www.lexile.com
http://www.winsteps.com
http://www.livingcapitalmetrics.com

William P. Fisher, Jr., Ph.D.

We are what we measure.
It’s time we measured what we want to be.

Advertisements

Tags: , , , , , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: