A colleague in the midst of writing a peer review for an educational research journal just wrote to ask how it could happen that the automatic association of Rasch’s models for measurement with Item Response Theory (IRT) could still be so prevalent. In this particular case, it was necessary to inform the article’s authors that the Rasch rating scale and partial credit models are not IRT models. Everyone involved in developing those models refers to measurement theory; if IRT comes up, it is in a critical context.
The point applies to all of the multifaceted, multilevel, multidimensional, and polytomous models developed in relation to Rasch’s original dichotomous model. Rasch (1960, pp. 110-115) derives his model for reading comprehension via an analogy to Newton’s Second Law of Motion (Fisher, 2010b, 2021). Despite a wealth of explanations on the distinctions between statistical models like those advanced in IRT and scientific models like Rasch’s, the quick and easy connection continues to be accepted by many researchers, reviewers, and editors. So it seems that an update to the basic argument provided repeatedly in the past (Andrich, 1989a/b/c; Fisher, 2010a; Wright, 1977, 1984, 1997, 1999; many others) is in order.
In the article under review, the works cited (by Andrich, Bond, Masters, Wright, etc.) make no positive mention or constructive use of IRT. This is because the addition of the second and third item parameters renders the models unidentified (they change into incomparable forms across data sets and so cannot support generalized inferences); see Fisher (2021), San Martin & Rolin (2013), San Martin, et al. (2009, 2015). As Embretson (1996, p. 211) put it, “if item discrimination parameters are required to obtain fit, total score is not even monotonically related to the IRT theta parameters.” The illogic of the situation in IRT applications is rarely acknowledged, since basically no one ever bothers explaining why people answering the same questions and having the same scores have different measurements, why the item order jumps around depending on who is responding, and how to keep track of the item hierarchy relevant to each person. Even when multiple IRT item parameters are estimated, end use applications either simply remain silent about the interpretation issues or employ the unidimensional Rasch scale.
Accordingly, Lumsden (1978, p. 22) recommended that “The two- and three-parameter logistic and normal ogive scaling models should be abandoned since, if the unidimensionality requirement is met, the Rasch (1960) one-parameter model will be realized.” Wood (1978, p. 31) similarly said “two- and three-parameter models are not the answer – test scaling models are self-contradictory if they assert both unidimensionality and different slopes for the item characteristic curves.” Wright (1977, p. 220; also see Wright, 1984, 1997, 1999) explained that:
“When scientists measure they intend their measurements to be objective in the sense of being generalizable beyond the moment of measurement. This means that, whatever parameters are thought to characterize the measuring instruments, they must remain relatively stable through the range of intended application and must not interact substantially with the objects being measured. It also means that the parameters intended to describe the process of measurement can be estimated successfully.”
IRT model parameters do not remain stable and in fact focus on the substantive interactions, even though no one deliberately writes items or chooses samples with the intention of fulfilling theoretical expectations that these interactions will occur. That is, no one writes items intending them to change their difficulty order depending on who responds to them. For the second and third item parameters to make sense, though, that variation is exactly what would have to be intended. Such situations do exist, such that models based in sufficient statistics have been formulated to rescale logits and to account for systematic differences in the relationships of the unit and discrimination and guessing (Andrich, et al., 2016; Humphry, 2011). But as Andrich (1989a/b/c) emphasizes, IRT model formulations generally assume that the point is to describe data, no matter how uninterpretable they may be, instead of intentionally designing instruments to produce data satisfying basic principles of inference.
Fisher (2021) gives the history documenting Rasch’s and Thurstone’s involvements in the development of the concept of identified models. Moreover, this position has been substantiated in recent years in a number of articles and books connecting Rasch models with measurement science and metrology (Mari, and Wilson, 2014; Mari, at al., 2021; Pendrill, 2014, 2019; Pendrill and Fisher, 2015; Fisher and Cano, 2022; etc.). Luca Mari, an electrical engineer involved in the Bureau International des Poids et Mesures (BIPM) SI Unit metrological standards deliberations, stated, in an article co-authored with Mark Wilson, that “Rasch models belong to the same class that metrologists consider paradigmatic of measurement.” A past chair of the European Association of National Metrology Institutes, Leslie Pendrill (2014, p. 26), similarly says: “The Rasch approach…is not simply a mathematical or statistical approach, but instead [is] a specifically metrological approach to human-based measurement.”
Of course, as long as the systems of incentives and rewards goes on supporting illogical reasoning and emotional, political, and economic attachments to counterproductive methods, arguments like those presented here are not likely to have much impact. It is important to go on the record, however, with reasoned positions as it does sometime happen that small numbers of readers are persuaded to test and perhaps change their perspectives.
Education and persuasion have a limited place, though, in the overall strategy being pursued in these efforts to advance the science of measurement. The common languages supported by metrologically traceable and quality-assured measurement systems historically have proven themselves vastly more powerful and efficacious than the chaotic confusion of incomparable metrics. And far from reducing rich complexity to manageable uniformity, metrologically sound measurement science is quite akin to tuning the instruments of the human and social sciences, with all the implications that follow from that metaphor concerning support and opportunities for capitalizing on unique local creative improvisations.
The primary obstacle to creating such systems in education, health care, human resource management, social services, environmental sustainability, etc. is how to organically culture new relationships of trust. Work in this vein that has been underway for decades continues to gain momentum (Fisher, 2023a/b).
References
Andrich, D. (1989a). Constructing fundamental measurements in social psychology. In J. A. Keats, R. Taft, R. A. Heath & S. H. Lovibond (Eds.), Mathematical and theoretical systems. Proceedings of the 24th International Congress of Psychology of the International Union of Psychological Science, Vol. 4 (pp. pp. 17-26). North-Holland.
Andrich, D. (1989b). Distinctions between assumptions and requirements in measurement in the social sciences. In J. A. Keats, R. Taft, R. A. Heath & S. H. Lovibond (Eds.), Mathematical and Theoretical Systems: Proceedings of the 24th International Congress of Psychology of the International Union of Psychological Science, Vol. 4 (pp. 7-16). Elsevier Science Publishers.
Andrich, D. (1989c). Statistical reasoning in psychometric models and educational measurement. Journal of Educational Measurement, 26(1), 81-90.
Andrich, D., Marais, I., & Humphry, S. M. (2016). Controlling guessing bias in the dichotomous Rasch model applied to a large scale, vertically scaled testing program. Educational and Psychological Measurement, 76(3), 412-435.
Embretson, S. E. (1996, September). Item Response Theory models and spurious interaction effects in factorial ANOVA designs. Applied Psychological Measurement, 20(3), 201-212.
Fisher, W. P., Jr. (2010a). IRT and confusion about Rasch measurement. Rasch Measurement Transactions, 24(2), 1288 [http://www.rasch.org/rmt/rmt242.pdf].
Fisher, W. P., Jr. (2010b). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics Conference Series, 238(1), http://iopscience.iop.org/1742-6596/238/1/012016/pdf/1742-6596_238_1_012016.pdf.
Fisher, W. P., Jr. (2021). Separation theorems in econometrics and psychometrics: Rasch, Frisch, two Fishers, and implications for measurement. Journal of Interdisciplinary Economics, OnlineFirst, 1-32. https://journals.sagepub.com/doi/10.1177/02601079211033475
Fisher, W. P., Jr. (2023a). Foreword: Koans, semiotics, and metrology in Stenner’s approach to measurement-informed science and commerce. In W. P. Fisher, Jr. & P. J. Massengill (Eds.), Explanatory models, unit standards, and personalized learning in educational measurement: Selected papers by A. Jackson Stenner (pp. ix-lxx). Springer.
Fisher, W. P., Jr. (2023b). Measurement systems, brilliant results, and brilliant processes in healthcare: Untapped potentials of person-centered outcome metrology for cultivating trust. In W. P. Fisher, Jr. & S. Cano (Eds.), Person-centered outcome metrology: Principles and applications for high stakes decision making. Springer.
Fisher, W. P., Jr., & Cano, S. (Eds.). (2023). Person-centred outcome metrology: Principles and applications for high stakes decision making. Springer Series in Measurement Science & Technology. Springer. https://link.springer.com/book/9783031074646
Humphry, S. M. (2011). The role of the unit in physics and psychometrics. Measurement: Interdisciplinary Research and Perspectives, 9(1), 1-24.
Lumsden, J. (1978). Tests are perfectly reliable. British Journal of Mathematical and Statistical Psychology, 31, 19-26.
Mari, L., & Wilson, M. (2014, May). An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315-327. http://www.sciencedirect.com/science/article/pii/S0263224114000645
Mari, L., Wilson, M., & Maul, A. (2021). Measurement across the sciences: Developing a shared concept system for measurement. Springer Series in Measurement Science and Technology. Springer.
Pendrill, L. R. (2014, December). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33. http://www.tandfonline.com/doi/abs/10.1080/19315775.2014.11721702
Pendrill, L. R. (2019). Quality assured measurement: Unification across social and physical sciences. Springer.
Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55. doi: http://dx.doi.org/10.1016/j.measurement.2015.04.010
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Danmarks Paedogogiske Institut.
San Martin, E., Gonzalez, J., & Tuerlinckx, F. (2009). Identified parameters, parameters of interest, and their relationships. Measurement: Interdisciplinary Research and Perspectives, 7(2), 97-105.
San Martin, E., Gonzalez, J., & Tuerlinckx, F. (2015). On the unidentifiability of the fixed-effects 3 PL model. Psychometrika, 80(2), 450-467.
San Martin, E., & Rolin, J. M. (2013). Identification of parametric Rasch-type models. Journal of Statistical Planning and Inference, 143(1), 116-130.
Wright, B. D. (1977). Misunderstanding the Rasch model. Journal of Educational Measurement, 14(3), 219-225.
Wright, B. D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3(1), 281-288 [http://www.rasch.org/memo41.htm].
Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm]. https://doi.org/10.1111/j.1745-3992.1997.tb00606.x
Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Lawrence Erlbaum Associates.