We all make and use dozens of measurements everyday, reading everything from clocks to speedometers to rulers to weight scales to thermometers. We also deal with numbers often called measures but which are not expressed in a meaningful unit of comparison, like test scores, ratings, survey percentages, counts of how many times something happened, etc. These numbers don’t stand for a constant amount that adds up in the way hours, distances, weight, and temperature do. These lower quality ordinal numbers from tests and ratings are in wide use and are commonly called measurements but are also generally understood as not obtaining the kind of rigor and precision associated with physical measurements.
The failure to produce meaningful measurements of abilities, attitudes, knowledge, and behaviors, however, does not mean the things we measure with tests, surveys, and rating scales cannot be scientifically quantified. On the contrary, meaningful, useful, reliable, and validated high quality measurement has been available and in use on mass scales for decades. Perhaps the public remains unaware of these developments because of the technical mathematics and laborious analytic details involved. There are also a great many unexamined cultural presuppositions that assume human attributes cannot be measured meaningfully, and should not be measured even if they can be.
The problem here, as is so often the case when uninformed opinions hold sway, is that the predominance of meaningless numbers masquerading as measures actually makes us much more worse off than we would be if we invested the time and resources needed to create the Intangible Assets Metric System I’ve referred to elsewhere in this blog.
Many of the other posts here contrast meaningless and meaningful approaches to measurement, so I won’t repeat any of that here. What I’ll do instead is provide something that’s been suggested by many friends and colleagues over the years: a simple checklist of basic features that ought to be readily available in any measurement system worthy of the name. To find out more about any of these features, search the terms as they are listed.
An interval unit
Individual measures in that unit
Individual item locations in that unit
Rating scale transition thresholds in that unit
An uncertainty or error term for each individual measure
Data quality, internal consistency, and model fit statistics for each individual measure
Experimental evidence supporting the claim to an interval unit
A mathematical model of the interval unit
References to mathematical proofs that (a) the observed data are necessary and sufficient to the estimation of the model parameters, and (b) the model satisfies the requirements of conjoint additivity
Cronbach’s alpha, a KR-20, a separation index, or a separation reliability coefficient expressing the ratio of explained variance to uncertainty/error variance, for both persons and items
A map of the construct measured illustrating how the items are supposed to work together
Interpretive guidelines showing what measures mean as functions of item scale locations
A Wright map illustrating the conjoint relation of the measure and item distributions
A kidmap or other map of individual ordered responses useful for informing instruction or treatment
A theory explaining variation in item scale locations
Evidence of traceability to a unit standard, if available
Evidence that items are not biased for or against any identifiable groups
Information on the calibration sample and results (responses per item, etc)
For a commercially successful scientific reading ability measurement framework, see http://www.lexile.com and Fisher & Stenner, 2016. For articles co-authored by metrology engineers and psychometricians, see Mari & Wilson, 2014, and Pendrill & Fisher, 2015. To see thousands of articles and books on related measurement topics dating back 50 years, search Rasch Measurement in Google Scholar. For consulting and advice on measurement, fill in a comment below, see http://www.livingcapitalmetrics.com, or explore the wealth of resources at http://www.rasch.org. For overviews of the role of measurement quality in education and health care, see Cano and Hobart (2011), Hobart, et al. (2007), Massof (2008), Massof and Rubin (2001), Wilson (2013), and Wright (1984, 1999).
References
Cano, S. J., & Hobart, J. C. (2011). The problem with health measurement. Patient Preference and Adherence, 5, 279-290.
Fisher, William P. Jr., and Stenner, A. Jackson. 2016. Theory-based metrological traceability in education: A reading measurement network, Measurement,
92, 489-496.
Hobart, J. C., Cano, S. J., Zajicek, J. P., & Thompson, A. J. (2007, December). Rating scales as outcome measures for clinical trials in neurology: Problems, solutions, and recommendations. Lancet Neurology, 6, 1094-1105.
Mari, Luca, and Wilson, Mark. 2014. An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315–327.
Massof, R. W. (2008, July-August). Editorial: Moving toward scientific measurements of quality of life. Ophthalmic Epidemiology, 15, 209-211.
Massof, R. W., & Rubin, G. S. (2001, May-Jun). Visual function assessment questionnaires. Survey of Ophthalmology, 45(6), 531-48.
Pendrill, Leslie, and Fisher, William P. Jr. 2015. Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.
Wilson, M. R. (2013, April). Seeking a balance between the statistical and scientific elements in psychometrics. Psychometrika, 78(2), 211-236.
Wright, B. D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3(1), 281-288 [http://www.rasch.org/memo41.htm].
Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.