Guttman on sufficiency, statistics, and cumulative science

“R. A. Fisher employed maximum likelihood…as a way of finding sufficient statistics if they exist. Now, sufficient statistics rarely exist, and even when they do, their use need not be optimal for estimation problems. As enlarged on in Reference 7 [an unpublished 1984 paper of Guttman’s], for each best unbiased sufficient statistic there generally is a better–and not necessarily sufficient–biased one. To use maximum likelihood requires knowledge of the complete sampling distribution, but biased estimation is proved to be better in a distribution-free fashion.” (Guttman, 1985, pp. 7-8; 1994, pp. 345-6)

In this passage, Guttman may be addressing issues related to the kind of biases that can affect extreme scores in Joint Maximum Likelihood Estimation (JMLE, formerly UCON) (Jansen, van den Wollenberg, Wierda, 1988; Wright, 1988; Wright & Panchapakesan, 1969). But what’s more interesting is the combination of the awareness of sufficiency and estimation issues revealed in this remark with the context in which it is made.

Guttman targets and rightly skewers a good number of inconsistencies and internal contradictions in statistical inference. But he shares in many of them himself. That is, Guttman’s valuable insights as to measurement are limited by his failure to consider at all the importance of the instrument in science, and by his limited appreciation of the value of theory. This is so despite his realization that “There can be no solution [to the problem of sampling items from one or more indefinitely large universes of content] without a structural theory” (1994, p. 329, in his Psychometrika review of Gulliksen’s Theory of Mental Tests), which is fully in tune with his emphasis on the central role of substantive replication in science (1994, p. 343).

But in the way he articulates his concern with replication, we see that, for Guttman, as for so many others, measurement is a matter of data analysis and not one of calibrating instruments. Measurement is not primarily a statistical process performed on computers, but is an individual event performed with an instrument. Calibrated instruments remove the necessity for data analysis (though other kinds of analysis may, of course, be continued or commenced).

In reading Guttman, it is difficult to follow through on his pithy and rich observations on the inconsistencies and illogic of statistical inference because he does not offer a clear alternative path, a measurement path structured by instrumentation. In his review of Lord and Novick (1968), for instance, Guttman remarks on the authors’ failure to provide their promised synthetic theory of tests and measurement, but does not offer or point toward one himself, even after noting the inclusion of Rasch’s Poisson models in the Lord and Novick classification system. Though much has been done to connect Guttman with Rasch (Andrich, 1982, 1985; Douglas & Wright, 1989; Engelhard, 2008; Linacre, 1991, 2000; Linacre & Wright, 1996; Tenenbaum, 1999; Wilson, 1989), and to advance in the direction of point-of-use measurement (Bode, 1999; Bode, Heinemann, & Semik, 2000; Connolly, Nachtman, & Pritchett, 1971; Davis, Perruccio, Canizares, Tennant, Hawker, et al., 2008; Linacre, 1997; many others), much more remains to be done.

Andrich, D. (1982, June). An index of person separation in Latent Trait Theory, the traditional KR-20 index, and the Guttman scale response pattern. Education Research and Perspectives, 9(1), 95-104 [http://www.rasch.org/erp7.htm].

Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. B. Tuma (Ed.), Sociological methodology 1985 (pp. 33-80). San Francisco, California: Jossey-Bass.

Bode, R. K. (1999). Self-scoring key for Galveston Orientation and Amnesia Test. Rasch Measurement Transactions, 13(1), 680 [http://www.rasch.org/rmt/rmt131c.htm].

Bode, R. K., Heinemann, A. W., & Semik, P. (2000, Feb). Measurement properties of the Galveston Orientation and Amnesia Test (GOAT) and improvement patterns during inpatient rehabilitation. Journal of Head Trauma Rehabilitation, 15(1), 637-55.

Connolly, A. J., Nachtman, W., & Pritchett, E. M. (1971). Keymath: Diagnostic Arithmetic Test. Circle Pines, Minnesota: American Guidance Service.

Davis, A. M., Perruccio, A. V., Canizares, M., Tennant, A., Hawker, G. A., Conaghan, P. G., et al. (2008, May). The development of a short measure of physical function for hip OA HOOS-Physical Function Shortform (HOOS-PS): An OARSI/OMERACT initiative. Osteoarthritis Cartilage, 16(5), 551-9.

Douglas, G. A., & Wright, B. D. (1989). Response patterns and their probabilities. Rasch Measurement Transactions, 3(4), 75-77 [http://www.rasch.org/rmt/rmt34.htm].

Engelhard, G. (2008, July). Historical perspectives on invariant measurement: Guttman, Rasch, and Mokken. Measurement: Interdisciplinary Research & Perspectives, 6(3), 155-189.

Guttman, L. (1985). The illogic of statistical inference for cumulative science. Applied Stochastic Models and Data Analysis, 1, 3-10. (Reprinted in Guttman 1994, pp. 341-348)

Guttman, L. (1994). Louis Guttman on theory and methodology: Selected writings (S. Levy, Ed.). Dartmouth Benchmark Series. Brookfield, VT: Dartmouth Publishing Company.

Jansen, P., Van den Wollenberg, A., & Wierda, F. (1988). Correcting unconditional parameter estimates in the Rasch model for inconsistency. Applied Psychological Measurement, 12(3), 297-306.

Linacre, J. M. (1991, Spring). Stochastic Guttman order. Rasch Measurement Transactions, 5(4), 189 [http://www.rasch.org/rmt/rmt54p.htm].

Linacre, J. M. (1997). Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation State of the Art Reviews, 11(2), 315-324 [http://www.rasch.org/memo60.htm].

Linacre, J. M. (2000, Autumn). Guttman coefficients and Rasch data. Rasch Measurement Transactions, 14(2), 746-7 [http://www.rasch.org/rmt/rmt142e.htm].

Linacre, J. M., & Wright, B. D. (1996, Autumn). Guttman-style item location maps. Rasch Measurement Transactions, 10(2), 492-3 [http://www.rasch.org/rmt/rmt102h.htm].

Lord, F. M., & Novick, M. R. (Eds.). (1968). Statistical theories of mental test scores. Reading, Massachusetts: Addison-Wesley.

Tenenbaum, G. (1999, Jan-Mar). The implementation of Thurstone’s and Guttman’s measurement ideas in Rasch analysis. International Journal of Sport Psychology, 30(1), 3-16.

Wilson, M. (1989). A comparison of deterministic and probabilistic approaches to learning structures. Australian Journal of Education, 33(2), 127-140.

Wright, B. D. (1988, Sep). The efficacy of unconditional maximum likelihood bias correction: Comment on Jansen, Van den Wollenberg, and Wierda. Applied Psychological Measurement, 12(3), 315-318.

Wright, B. D., & Panchapakesan, N. (1969). A procedure for sample-free item analysis. Educational and Psychological Measurement, 29(1), 23-48.

LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Tags: instrument calibration, measurement, science, standards, Statistics, theory

This entry was posted on April 29, 2010 at 11:27 and is filed under instruments, measurement, Psychometrics, Rasch, Ronald Fisher, science. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Livingcapitalmetrics's Blog