Archive for April, 2018

Measurement quality checklist

April 21, 2018

We all make and use dozens of measurements everyday, reading everything from clocks to speedometers to rulers to weight scales to thermometers. We also deal with numbers often called measures but which are not expressed in a meaningful unit of comparison, like test scores, ratings, survey percentages, counts of how many times something happened, etc. These numbers don’t stand for a constant amount that adds up in the way hours, distances, weight, and temperature do. These lower quality ordinal numbers from tests and ratings are in wide use and are commonly called measurements but are also generally understood as not obtaining the kind of rigor and precision associated with physical measurements.

The failure to produce meaningful measurements of abilities, attitudes, knowledge, and behaviors, however, does not mean the things we measure with tests, surveys, and rating scales cannot be scientifically quantified. On the contrary, meaningful, useful, reliable, and validated high quality measurement has been available and in use on mass scales for decades. Perhaps the public remains unaware of these developments because of the technical mathematics and laborious analytic details involved. There are also a great many unexamined cultural presuppositions that assume human attributes cannot be measured meaningfully, and should not be measured even if they can be.

The problem here, as is so often the case when uninformed opinions hold sway, is that the predominance of meaningless numbers masquerading as measures actually makes us much more worse off than we would be if we invested the time and resources needed to create the Intangible Assets Metric System I’ve referred to elsewhere in this blog.

Many of the other posts here contrast meaningless and meaningful approaches to measurement, so I won’t repeat any of that here. What I’ll do instead is provide something that’s been suggested by many friends and colleagues over the years: a simple checklist of basic features that ought to be readily available in any measurement system worthy of the name. To find out more about any of these features, search the terms as they are listed.

An interval unit

Individual measures in that unit

Individual item locations in that unit

Rating scale transition thresholds in that unit

An uncertainty or error term for each individual measure

Data quality, internal consistency, and model fit statistics for each individual measure

Experimental evidence supporting the claim to an interval unit

A mathematical model of the interval unit

References to mathematical proofs that (a) the observed data are necessary and sufficient to the estimation of the model parameters, and (b) the model satisfies the requirements of conjoint additivity

Cronbach’s alpha, a KR-20, a separation index, or a separation reliability coefficient expressing the ratio of explained variance to uncertainty/error variance, for both persons and items

A map of the construct measured illustrating how the items are supposed to work together

Interpretive guidelines showing what measures mean as functions of item scale locations

A Wright map illustrating the conjoint relation of the measure and item distributions

A kidmap or other map of individual ordered responses useful for informing instruction or treatment

A theory explaining variation in item scale locations

Evidence of traceability to a unit standard, if available

Evidence that items are not biased for or against any identifiable groups

Information on the calibration sample and results (responses per item, etc)

For a commercially successful scientific reading ability measurement framework, see and Fisher & Stenner, 2016. For articles co-authored by metrology engineers and psychometricians, see Mari & Wilson, 2014, and Pendrill & Fisher, 2015. To see thousands of articles and books on related measurement topics dating back 50 years, search Rasch Measurement in Google Scholar. For consulting and advice on measurement, fill in a comment below, see, or explore the wealth of resources at For overviews of the role of measurement quality in education and health care, see Cano and Hobart (2011), Hobart, et al. (2007), Massof (2008), Massof and Rubin (2001), Wilson (2013), and Wright (1984, 1999).


Cano, S. J., & Hobart, J. C. (2011). The problem with health measurement. Patient Preference and Adherence, 5, 279-290.

Fisher, William P. Jr., and Stenner, A. Jackson. 2016. Theory-based metrological traceability in education: A reading measurement network, Measurement,
92, 489-496.

Hobart, J. C., Cano, S. J., Zajicek, J. P., & Thompson, A. J. (2007, December). Rating scales as outcome measures for clinical trials in neurology: Problems, solutions, and recommendations. Lancet Neurology, 6, 1094-1105.

Mari, Luca, and Wilson, Mark. 2014. An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315–327.

Massof, R. W. (2008, July-August). Editorial: Moving toward scientific measurements of quality of life. Ophthalmic Epidemiology, 15, 209-211.

Massof, R. W., & Rubin, G. S. (2001, May-Jun). Visual function assessment questionnaires. Survey of Ophthalmology, 45(6), 531-48.

Pendrill, Leslie, and Fisher, William P. Jr. 2015.  Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.

Wilson, M. R. (2013, April). Seeking a balance between the statistical and scientific elements in psychometrics. Psychometrika, 78(2), 211-236.

Wright, B. D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3(1), 281-288 [].

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 []). Hillsdale, New Jersey: Lawrence Erlbaum Associates.


Reductionist vs Nonreductionist Conceptualizations of Psychological and Social Measurement

April 19, 2018


  • Root metaphor: Mechanical clockwork universe
  • Paradigmatic case: Newtonian physics
  • Complete, consistent, deterministic structures
  • Whole is sum of parts
  • Sufficient statistics: Mean & std deviation
  • Uncertainty is variation across repeated measures
  • Test/survey items in use define totality of universe of possibility; changes in items change that universe
  • Descriptive, reactionary
  • Microlevel facts are supposed to additively combine into general laws
  • General laws are discovered by measuring
  • Top down data analytics influence policy
  • Externally imposed assembly processes
  • Subject/object dualism institutionalized in data analytics process
  • Data are hallmark criterion of objectivity
  • Subjectivity discounted, removed if possible
  • Counts are quantities
  • Ordinal scores treated as interval measures with no justification
  • Score variation relates solely to person characteristics
  • Score meaning tied to particular questions asked
  • Quantitative methods don’t define unit quantities or test for them
  • Qualitative data and methods are separated from quantitative data/methods
  • No model of construct stated or tested
  • Group level multivariate focus
  • P-values are primary model fit criterion
  • Population sampling motivates probabilistic approach
  • Equating based on statistical assumptions concerning score distribution


  • Root metaphor: Living organic universe
  • Paradigmatic case: Multilevel ecosystems
  • Incomplete, not perfectly consistent, stochastic structures
  • Whole is greater than sum of parts
  • Sufficient statistics: scores
  • Uncertainty is resonance of stochastic invariance within individual measures
  • Test/survey items in use sample from infinite population; changes in items used do not change that universe
  • Prescriptive, anticipatory
  • Microlevel facts self-organize into meso abstractions & macro formalisms
  • Measuring presumes general laws
  • Bottom up alignments and coordinations of decisions and behaviors move society
  • Internal processes of self-organization
  • Mutually implied subject-object entangled together in playful flow institutionalized via distributed instrumentation
  • Objectivity requires data explained by theory embodied in instruments
  • Subjectivity included as valid source of concerns and insights scrutinized for value
  • Counts might lead to quantity definition
  • Interval measures theoretically and empirically substantiated
  • Empirical & theoretical measure variation maps construct via items and persons
  • Measure meaning is independent of particular questions asked
  • Quantitative methods define unit quantities and test for them
  • Qualitative methods are integrated with quantitative methods
  • Mathematical, observation, and construct models stated and tested
  • Individual level univariate focus
  • Meaningful construct definition primary model fit criterion
  • Individual response process motivates probabilistic approach
  • Equating requires alignment of items along common dimension