Posts Tagged ‘Rasch models’

Psychology and the social sciences: An atheoretical, scattered, and disconnected body of research

February 16, 2019

A new article in Nature Human Behaviour (NHB) points toward the need for better theory and more rigorous mathematical models in psychology and the social sciences (Muthukrishna & Henrich, 2019). The authors rightly say that the lack of an overarching cumulative theoretical framework makes it very difficult to see whether new results fit well with previous work, or if something surprising has come to light. Mathematical models are especially emphasized as being of value in specifying clear and precise expectations.

The point that the social sciences and psychology need better theories and models is painfully obvious. But there are in fact thousands of published studies and practical real world applications that not only provide, but indeed often surpass, the kinds of predictive theories and mathematical models called for in the NHB article. The article not only makes no mention of any of this work, its argument is framed entirely in a statistical context instead of the more appropriate context of measurement science.

The concept of reliability provides an excellent point of entry. Most behavioral scientists think of reliability statistically, as a coefficient with a positive numeric value usually between 0.00 and 1.00. The tangible sense of reliability as indicating exactly how predictable an outcome is does not usually figure in most researchers’ thinking. But that sense of the specific predictability of results has been the focus of attention in social and psychological measurement science for decades.

For instance, the measurement of time is reliable in the sense that the position of the sun relative to the earth can be precisely predicted from geographic location, the time of day, and the day of the year. The numbers and words assigned to noon time are closely associated with the Sun being at the high point in the sky (though there are political variations by season and location across time zones).

That kind of a reproducible association is rarely sought in psychology and the social sciences, but it is far from nonexistent. One can discern different degrees to which that kind of association is included in models of measured constructs. Though most behavioral research doesn’t mention the connection between linear amounts of a measured phenomenon and a reproducible numeric representation of it (level 0), quite a significant body of work focuses on that connection (level 1). The disappointing thing about that level 1 work is that the relentless obsession with statistical methods prevents most researchers from connecting a reproducible quantity with a single expression of it in a standard unit, and with an associated uncertainty term (level 2). That is, level 1 researchers conceive measurement in statistical terms, as a product of data analysis. Even when results across data sets are highly correlated and could be equated to a common metric, level 1 researchers do not leverage that source of potential value for simplified communication and accumulated comparability.

And then, for their part, level 2 researchers usually do not articulate theories about the measured constructs, by augmenting the mathematical data model with an explanatory model predicting variation (level 3). Level 2 researchers are empirically grounded in data, and can expand their network of measures only by gathering more data and analyzing it in ways that bring it into their standard unit’s frame of reference.

Level 3 researchers, however, have come to see what makes their measures tick. They understand the mechanisms that make their questions vary. They can write new questions to their theoretical specifications, test those questions by asking them of a relevant sample, and produce the predicted calibrations. For instance, reading comprehension is well established to be a function of the difference between a person’s reading ability and the complexity of the text they encounter (see articles by Stenner in the list below). We have built our entire educational system around this idea, as we deliberately introduce children first to the alphabet, then to the most common words, then to short sentences, and then to ever longer and more complicated text. But stating the construct model, testing it against data, calibrating a unit to which all tests and measures can be traced, and connecting together all the books, articles, tests, curricula, and students is a process that began (in English and Spanish) only in the 1980s. The process still is far from finished, and most reading research still does not use the common metric.

In this kind of theory-informed context, new items can be automatically generated on the fly at the point of measurement. Those items and inferences made from them are validated by the consistency of the responses and the associated expression of the expected probability of success, agreement, etc. The expense of constant data gathering and analysis can be cut to a very small fraction of what it is at levels 0-2.

Level 3 research methods are not widely known or used, but they are not new. They are gaining traction as their use by national metrology institutes globally grows. As high profile critiques of social and psychological research practices continue to emerge, perhaps more attention will be paid to this important body of work. A few key references are provided below, and virtually every post in this blog pertains to these issues.

References

Baghaei, P. (2008). The Rasch model as a construct validation tool. Rasch Measurement Transactions, 22(1), 1145-6 [http://www.rasch.org/rmt/rmt221a.htm].

Bergstrom, B. A., & Lunz, M. E. (1994). The equivalence of Rasch item calibrations and ability estimates across modes of administration. In M. Wilson (Ed.), Objective measurement: Theory into practice, Vol. 2 (pp. 122-128). Norwood, New Jersey: Ablex.

Cano, S., Pendrill, L., Barbic, S., & Fisher, W. P., Jr. (2018). Patient-centred outcome metrology for healthcare decision-making. Journal of Physics: Conference Series, 1044, 012057.

Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct validation. Measurement & Evaluation in Counseling & Development, 43(2), 121-149.

Embretson, S. E. (2010). Measuring psychological constructs: Advances in model-based approaches. Washington, DC: American Psychological Association.

Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.

Fischer, G. H. (1983). Logistic latent trait models with linear constraints. Psychometrika, 48(1), 3-26.

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238 [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2008). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-1163 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., & Stenner, A. J. (2016). Theory-based metrological traceability in education: A reading measurement network. Measurement, 92, 489-496.

Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37(4), 827-833.

Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139-64.

Hobart, J. C., Cano, S. J., Zajicek, J. P., & Thompson, A. J. (2007). Rating scales as outcome measures for clinical trials in neurology: Problems, solutions, and recommendations. Lancet Neurology, 6, 1094-1105.

Irvine, S. H., Dunn, P. L., & Anderson, J. D. (1990). Towards a theory of algorithm-determined cognitive test construction. British Journal of Psychology, 81, 173-195.

Kline, T. L., Schmidt, K. M., & Bowles, R. P. (2006). Using LinLog and FACETS to model item components in the LLTM. Journal of Applied Measurement, 7(1), 74-91.

Lunz, M. E., & Linacre, J. M. (2010). Reliability of performance examinations: Revisited. In M. Garner, G. Engelhard, Jr., W. P. Fisher, Jr. & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 1 (pp. 328-341). Maple Grove, MN: JAM Press.

Mari, L., & Wilson, M. (2014). An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315-327.

Markward, N. J., & Fisher, W. P., Jr. (2004). Calibrating the genome. Journal of Applied Measurement, 5(2), 129-141.

Maul, A., Mari, L., Torres Irribarra, D., & Wilson, M. (2018). The quality of measurement results in terms of the structural features of the measurement process. Measurement, 116, 611-620.

Muthukrishna, M., & Henrich, J. (2019). A problem in theory. Nature Human Behaviour, 1-9.

Obiekwe, J. C. (1999, August 1). Application and validation of the linear logistic test model for item difficulty prediction in the context of mathematics problems. Dissertation Abstracts International: Section B: The Sciences & Engineering, 60(2-B), 0851.

Pendrill, L. (2014). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33.

Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.

Pendrill, L., & Petersson, N. (2016). Metrology of human-based and other qualitative measurements. Measurement Science and Technology, 27(9), 094003.

Sijtsma, K. (2009). Correcting fallacies in validity, reliability, and classification. International Journal of Testing, 8(3), 167-194.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107-120.

Stenner, A. J. (2001). The necessity of construct theory. Rasch Measurement Transactions, 15(1), 804-5 [http://www.rasch.org/rmt/rmt151q.htm].

Stenner, A. J., Fisher, W. P., Jr., Stone, M. H., & Burdick, D. S. (2013). Causal Rasch models. Frontiers in Psychology: Quantitative Psychology and Measurement, 4(536), 1-14.

Stenner, A. J., & Horabin, I. (1992). Three stages of construct definition. Rasch Measurement Transactions, 6(3), 229 [http://www.rasch.org/rmt/rmt63b.htm].

Stenner, A. J., Stone, M. H., & Fisher, W. P., Jr. (2018). The unreasonable effectiveness of theory based instrument calibration in the natural sciences: What can the social sciences learn? Journal of Physics Conference Series, 1044(012070).

Stone, M. H. (2003). Substantive scale construction. Journal of Applied Measurement, 4(3), 282-297.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wilson, M. R. (2013). Using the concept of a measurement system to characterize measurement models used in psychometrics. Measurement, 46, 3766-3774.

Wright, B. D., & Stone, M. H. (1979). Chapter 5: Constructing a variable. In Best test design: Rasch measurement (pp. 83-128). Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/measess/me-all.pdf].

Wright, B. D., Stone, M., & Enos, M. (2000). The evolution of meaning in practice. Rasch Measurement Transactions, 14(1), 736 [http://www.rasch.org/rmt/rmt141g.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

Six Classes of Results Supporting the Measurability of Human Functioning and Capability

April 12, 2014

Another example of high-level analysis that suffers from a lack of input from state of the art measurement arises in Nussbaum (1997, p. 1205), where the author remarks that it is now a matter of course, in development economics, “to recognize distinct domains of human functioning and capability that are not commensurable along a single metric, and with regard to which choice and liberty of agency play a fundamental structuring role.” Though Nussbaum (2011, pp. 58-62) has lately given a more nuanced account of the challenges of measurement relative to human capabilities, appreciation of the power and flexibility of contemporary measurement models, methods, and instruments remains lacking. For a detailed example of the complexities and challenges that must be addressed in the context of global human development, which is Nussbaum’s area of interest, see Fisher (2011).

Though there are indeed domains of human functioning and capability that are not commensurable along a single metric, they are not the ones referred to by Nussbaum or the texts she cites. On the contrary, six different approaches to establishing the measurability of human functioning and capability have been explored and proven as providing, especially in their composite aggregate, a substantial basis for theory and practice (modified from Fisher, 2009, pp. 1279-1281). These six classes of results speak to the abstract, mathematical side of the paradox noted by Ricoeur (see previous post here) concerning the need to simultaneously accept roles for abstract ideal global universals and concrete local historical contexts in strategic planning and thinking. The six classes of results are:

  1. Mathematical proofs of the necessity and sufficiency of test and survey scores for invariant measurement in the context of Rasch’s probabilistic models (Andersen, 1977, 1999; Fischer, 1981; Newby, Conner, Grant, and Bunderson, 2009; van der Linden, 1992).
  2. Reproduction of physical units of measurement (centimeters, grams, etc.) from ordinal observations (Choi, 1997; Moulton, 1993; Pelton and Bunderson, 2003; Stephanou and Fisher, 2013).
  3. The common mathematical form of the laws of nature and Rasch models (Rasch, 1960, pp. 110-115; Fisher, 2010; Fisher and Stenner, 2013).
  4. Multiple independent studies of the same constructs on different (and common) samples using different (and the same) instruments intended to measure the same thing converge on common units, defining the same objects, substantiating theory, and supporting the viability of standardized metrics (Fisher, 1997a, 1997b, 1999, etc.).
  5. Thousands of peer-reviewed publications in hundreds of scientific journals provide a wide-ranging and diverse array of supporting evidence and theory.
  6. Analogous causal attributions and theoretical explanatory power can be created in both natural and social science contexts (Stenner, Fisher, Stone, and Burdick, 2013).

What we have here, in sum, is a combination of Greek axiomatic and Babylonian empirical algorithms, in accord with Toulmin’s (1961, pp. 28-33) sense of the contrasting principled bases for scientific advancement. Feynman (1965, p. 46) called for less of a focus on the Greek chain of reasoning approach, as it is only as strong as its weakest link, whereas the Babylonian algorithms are akin to a platform with enough supporting legs that one or more might fail without compromising its overall stability. The variations in theory and evidence under these six headings provide ample support for the conceptual and practical viability of metrological systems of measurement in education, health care, human resource management, sociology, natural resource management, social services, and many other fields. The philosophical critique of any type of economics will inevitably be wide of the mark if uninformed about these accomplishments in the theory and practice of measurement.

References

Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42(1), 69-81.

Andersen, E. B. (1999). Sufficient statistics in educational measurement. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 122-125). New York: Pergamon.

Choi, S. E. (1997). Rasch invents “ounces.” Rasch Measurement Transactions, 11(2), 557 [http://www.rasch.org/rmt/rmt112.htm#Ounces].

Feynman, R. (1965). The character of physical law. Cambridge, Massachusetts: MIT Press.

Fischer, G. H. (1981). On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika, 46(1), 59-77.

Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Jr. (1997). What scale-free measurement means to health outcomes research. Physical Medicine & Rehabilitation State of the Art Reviews, 11(2), 357-373.

Fisher, W. P., Jr. (1999). Foundations for health status metrology: The stability of MOS SF-36 PF-10 calibrations across samples. Journal of the Louisiana State Medical Society, 151(11), 566-578.

Fisher, W. P., Jr. (2009). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P., Jr. (2010). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics: Conference Series, 238(1), http://iopscience.iop.org/1742-6596/238/1/012016/pdf/1742-6596_238_1_012016.pdf.

Fisher, W. P., Jr. (2011). Measuring genuine progress by scaling economic indicators to think global & act local: An example from the UN Millennium Development Goals project. LivingCapitalMetrics.com. Retrieved 18 January 2011, from Social Science Research Network: http://ssrn.com/abstract=1739386.

Fisher, W. P., Jr., & Stenner, A. J. (2013). On the potential for improved measurement in the human and social sciences. In Q. Zhang & H. Yang (Eds.), Pacific Rim Objective Measurement Symposium 2012 Conference Proceedings (pp. 1-11). Berlin, Germany: Springer-Verlag.

Moulton, M. (1993). Probabilistic mapping. Rasch Measurement Transactions, 7(1), 268 [http://www.rasch.org/rmt/rmt71b.htm].

Newby, V. A., Conner, G. R., Grant, C. P., & Bunderson, C. V. (2009). The Rasch model and additive conjoint measurement. Journal of Applied Measurement, 107(4), 348-354.

Nussbaum, M. (1997). Flawed foundations: The philosophical critique of (a particular type of) economics. University of Chicago Law Review, 64, 1197-1214.

Nussbaum, M. (2011). Creating capabilities: The human development approach. Cambridge, MA: The Belknap Press.

Pelton, T., & Bunderson, V. (2003). The recovery of the density scale using a stochastic quasi-realization of additive conjoint measurement. Journal of Applied Measurement, 4(3), 269-281.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14, 58-94.

Stenner, A. J., Fisher, W. P., Jr., Stone, M. H., & Burdick, D. S. (2013). Causal Rasch models. Frontiers in Psychology: Quantitative Psychology and Measurement, 4(536), 1-14.

Stephanou, A., & Fisher, W. P., Jr. (2013). From concrete to abstract in the measurement of length. Journal of Physics Conference Series, 459, http://iopscience.iop.org/1742-6596/459/1/012026.

Toulmin, S. E. (1961). Foresight and understanding: An enquiry into the aims of science. London, England: Hutchinson.

van der Linden, W. J. (1992). Sufficient and necessary statistics. Rasch Measurement Transactions, 6(3), 231 [http://www.rasch.org/rmt/rmt63d.htm].

 

Convergence, Divergence, and the Continuum of Field-Organizing Activities

March 29, 2014

So what are the possibilities for growing out green shoots from the seeds and roots of an ethical orientation to keeping the dialogue going? What kinds of fruits might be expected from cultivating a common ground for choosing discourse over violence? What are the consequences for practice of planting this seed in this ground?

The same participant in the conversation earlier this week at Convergence XV who spoke of the peace building processes taking place around the world also described a developmental context for these issues of mutual understanding. The work of Theo Dawson and her colleagues (Dawson, 2002a, 2002b, 2004; Dawson, Fischer, and Stein, 2006) is especially pertinent here. Their comparisons of multiple approaches to cognitive and moral development have provided clear and decisive theory, evidence, and instrumentation concerning the conceptual integrations that take place in the evolution of hierarchical complexity.

Conceptual integrations occur when previously tacit, unexamined, and assumed principles informing a sphere of operations are brought into conscious awareness and are transformed into explicit objects of new operations. Developmentally, this is the process of discovery that takes place from the earliest stages of life, in utero. Organisms of all kinds mature in a process of interaction with their environments. Young children at the “terrible two” stage, for instance, are realizing that anything they can detach from, whether by throwing or by denying (“No!”), is not part of them. Only a few months earlier, the same children will have been fascinated with their fingers and toes, realizing these are parts of their own bodies, often by putting them in their mouths.

There are as many opportunities for conceptual integrations between the ages of 21 to 99 as there are between birth and 21. Developmental differences in perspectives can make for riotously comic situations, and can also lead to conflicts, even when the participants agree on more than they disagree on. And so here we arrive at a position from which we can get a grip on how to integrate convergence and divergence in a common framework that follows from the prior post’s brief description of the ontological method’s three moments of reduction, application, and deconstruction.

Image

Woolley and colleagues (Woolley, et al., 2010; Woolley and Fuchs, 2011) describe a continuum of five field-organizing activities categorizing the types of information needed for effective collective intelligence (Figure 1). Four of these five activities (defining, bounding, opening, and bridging) vary in the convergent versus divergent processes they bring to bear in collective thinking. Defining and bounding are convergent processes that inform judgment and decision making. These activities are especially important in the emergence of a new field or organization, when the object of interest and the methods of recognizing and producing it are in contention. Opening and bridging activities, in contrast, diverge from accepted definitions and transgress boundaries in the creative process of pushing into new areas. Undergirding the continuum as a whole is the fifth activity, grounding, which serves as a theory- and evidence-informed connection to meaningful and useful results.

There are instances in which defining and bounding activities have progressed to the point that the explanatory power of theory enables the calibration of test items from knowledge of the component parts included in those items. The efficiencies and cost reductions gained from computer-based item generation and administration are significant. Research in this area takes a variety of approaches; for more information, see Daniel and Embretson (2010), DeBoeck and Wilson (2004), Stenner, et al. (2013), and others.

The value of clear definitions and boundaries in this context stems in large part from the capacity to identify exceptions that prove (test) the rules, and that then also provide opportunities for opening and bridging. Kuhn (1961, p. 180; 1977, p. 205) noted that

To the extent that measurement and quantitative technique play an especially significant role in scientific discovery, they do so precisely because, by displaying significant anomaly, they tell scientists when and where to look for a new qualitative phenomenon.

Rasch (1960, p. 124) similarly understood that “Once a law has been established within a certain field then the law itself may serve as a tool for deciding whether or not added stimuli and/or objects belong to the original group.” Rasch gives the example of mechanical force applied to various masses with resulting accelerations, introducing idea that one of the instruments might exert magnetic as well as mechanical force, with noticeable effects on steel masses, but not on wooden masses. Rasch suggests that exploration of these anomalies may result in the discovery of other similar instruments that vary in the extent to which they also exert the new force, with the possible consequence of discovering a law of magnetic attraction.

There has been an intense interest in the assessment of divergent inconsistencies in measurement research and practice following in the wake of Rasch’s early work in psychological and social measurement (examples from a very large literature in this area include Karabatsos and Ulrich, 2002, and Smith and Plackner, 2009). Andrich, for instance, makes explicit reference to Kuhn (1961), saying, “…the function of a model for measurement…is to disclose anomalies, not merely to describe data” (Andrich, 2002, p. 352; also see Andrich, 1996, 2004, 2011). Typical software for applying Rasch models (Andrich, et al., 2013; Linacre, 2011, 2013; Wu, et al., 2007) thus accordingly provides many more qualitative numbers evaluating potential anomalies than quantitative measuring numbers. These qualitative numbers (digits that do not stand for something substantive that adds up in a constant unit) include uncertainty and confidence indicators that vary with sample size; mean square and standardized model fit statistics; and principal components analysis factor loadings and eigenvalues.

The opportunities for divergent openings onto new qualitative phenomena provided by data consistency evaluations are complemented in Rasch measurement by a variety of bridging activities. Different instruments intended to measure the same or closely related constructs may often be equated or co-calibrated, so they measure in a common unit (among many publications in this area, see Dawson, 2002a, 2004; Fisher, 1997; Fisher, et al., 1995; Massof and Ahmadian, 2007; Smith and Taylor, 2004). Similarly, the same instrument calibrated on different samples from the same population may exhibit consistent properties across those samples, offering further evidence of a potential for defining a common unit (Fisher, 1999).

Other opening and bridging activities include capacities (a) to drop items or questions from a test or survey, or to add them; (b) to adaptively administer subsets of custom-selected items from a large bank; and (c) to adjust measures for the leniency or severity of judges assigning ratings, all of which can be done, within the limits of the relevant definitions and boundaries, without compromising the unit of comparison. For methodological overviews, see Bond and Fox (2007), Wilson (2005), and others.

The various field-organizing activities spanning the range from convergence to divergence are implicated not only in research on collective thinking, but also in the history and philosophy of science. Galison and colleagues (Galison, 1997, 1999; Galison and Stump, 1996) closely examine positivist and antipositivist perspectives on the unity of science, finding their conclusions inconsistent with the evidence of history. A postpositivist perspective (Galison, 1999, p. 138), in contrast, finds “distinct communities and incommensurable beliefs” between and often within the areas of theory, experiment, and instrument-making. But instead of finding these communities “utterly condemned to passing one another without any possibility of significant interaction,” Galison (1999, p. 138) observes that “two groups can agree on rules of exchange even if they ascribe utterly different significance to the objects being exchanged; they may even disagree on the meaning of the exchange process itself.” In practice, “trading partners can hammer out a local coordination despite vast global differences.”

In accord with Woolley and colleagues’ work on convergent and divergent field-organizing activities, Galison (1999, p. 137) concludes, then, that “science is disunified, and—against our first intuitions—it is precisely the disunification of science that underpins its strength and stability.” Galison (1997, pp. 843-844) concludes with a section entitled “Cables, Bricks, and Metaphysics” in which the postpositivist disunity of science is seen to provide its unexpected coherence from the simultaneously convergent and divergent ways theories, experiments, and instruments interact.

But as Galison recognizes, a metaphor based on the intertwined strands in a cable is too mechanical to support the dynamic processes by which order arises from particular kinds of noise and chaos. Not cited by Galison is a burgeoning literature on the phenomenon of noise-induced order termed stochastic resonance (Andò  and Graziani 2000, Benzi, et al., 1981; Dykman and McClintock, 1998; Fisher, 1992, 2011; Hess and Albano, 1998; Repperger and Farris, 2010). Where the metaphor of a cable’s strands breaks down, stochastic resonance provides multiple ways of illustrating how the disorder of finite and partially independent processes can give rise to an otherwise inaccessible order and structure.

Stochastic resonance involves small noisy signals that can be amplified to have very large effects. The noise has to be of a particular kind, and too much of it will drown out rather than amplify the effect. Examples include the interaction of neuronal ensembles in the brain (Chialvo, Lontin, and Müller-Gerking, 1996), speech recognition (Moskowitz and Dickinson, 2002), and perceptual interpretation (Rianni and Simonotto, 1994). Given that Rasch’s models for measurement are stochastic versions of Guttman’s deterministic models (Andrich, 1985), the question has been raised as to how Rasch’s seemingly weaker assumptions could lead to a measurement model that is stronger than Guttman’s (Duncan, 1984, p. 220). Stochastic resonance may provide an essential clue to this puzzle (Fisher, 1992, 2011).

Another description of what might be a manifestation of stochastic resonance akin to that brought up by Galison arises in Berg and Timmermans’ (2000, p. 56) study of the constitution of universalities in a medical network. They note that, “Paradoxically, then, the increased stability and reach of this network was not due to more (precise) instructions: the protocol’s logistics could thrive only by parasitically drawing upon its own disorder.” Much the same has been said about the behaviors of markets (Mandelbrot, 2004), bringing us back to the topic of the day at Convergence XV earlier this week. I’ll have more to say on this issue of universalities constituted via noise-induced order in due course.

References

Andò, B., & Graziani, S. (2000). Stochastic resonance theory and applications. New York: Kluwer Academic Publishers.

Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. B. Tuma (Ed.), Sociological methodology 1985 (pp. 33-80). San Francisco, California: Jossey-Bass.

Andrich, D. (1996). Measurement criteria for choosing among models with graded responses. In A. von Eye & C. Clogg (Eds.), Categorical variables in developmental research: Methods of analysis (pp. 3-35). New York: Academic Press, Inc.

Andrich, D. (2002). Understanding resistance to the data-model relationship in Rasch’s paradigm: A reflection for the next generation. Journal of Applied Measurement, 3(3), 325-359.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Andrich, D. (2011). Rating scales and Rasch measurement. Expert Reviews in Pharmacoeconomics Outcome Research, 11(5), 571-585.

Andrich, D., Lyne, A., Sheridan, B., & Luo, G. (2013). RUMM 2030: Rasch unidimensional models for measurement. Perth, Australia: RUMM Laboratory Pty Ltd [www.rummlab.com.au].

Benzi, R., Sutera, A., & Vulpiani, A. (1981). The mechanism of stochastic resonance. Journal of Physics. A. Mathematical and General, 14, L453-L457.

Berg, M., & Timmermans, S. (2000). Order and their others: On the constitution of universalities in medical work. Configurations, 8(1), 31-61.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Chialvo, D., Longtin, A., & Müller-Gerking, J. (1996). Stochastic resonance in models of neuronal ensembles revisited [Electronic version].

Daniel, R. C., & Embretson, S. E. (2010). Designing cognitive complexity in mathematical problem-solving items. Applied Psychological Measurement, 34(5), 348-364.

Dawson, T. L. (2002a, Summer). A comparison of three developmental stage scoring systems. Journal of Applied Measurement, 3(2), 146-89.

Dawson, T. L. (2002b, March). New tools, new insights: Kohlberg’s moral reasoning stages revisited. International Journal of Behavioral Development, 26(2), 154-66.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

Dawson, T. L., Fischer, K. W., & Stein, Z. (2006). Reconsidering qualitative and quantitative research approaches: A cognitive developmental perspective. New Ideas in Psychology, 24, 229-239.

De Boeck, P., & Wilson, M. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. Statistics for Social and Behavioral Sciences). New York: Springer-Verlag.

Duncan, O. D. (1984). Notes on social measurement: Historical and critical. New York: Russell Sage Foundation.

Dykman, M. I., & McClintock, P. V. E. (1998, January 22). What can stochastic resonance do? Nature, 391(6665), 344.

Fisher, W. P., Jr. (1992, Spring). Stochastic resonance and Rasch measurement. Rasch Measurement Transactions, 5(4), 186-187 [http://www.rasch.org/rmt/rmt54k.htm].

Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Jr. (1999). Foundations for health status metrology: The stability of MOS SF-36 PF-10 calibrations across samples. Journal of the Louisiana State Medical Society, 151(11), 566-578.

Fisher, W. P., Jr. (2011). Stochastic and historical resonances of the unit in physics and psychometrics. Measurement: Interdisciplinary Research & Perspectives, 9, 46-50.

Fisher, W. P., Jr., Harvey, R. F., Taylor, P., Kilgore, K. M., & Kelly, C. K. (1995, February). Rehabits: A common language of functional assessment. Archives of Physical Medicine and Rehabilitation, 76(2), 113-122.

Galison, P. (1997). Image and logic: A material culture of microphysics. Chicago: University of Chicago Press.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York: Routledge.

Galison, P., & Stump, D. J. (1996). The disunity of science: Boundaries, contexts, and power. Palo Alto, California: Stanford University Press.

Hess, S. M., & Albano, A. M. (1998, February). Minimum requirements for stochastic resonance in threshold systems. International Journal of Bifurcation and Chaos, 8(2), 395-400.

Karabatsos, G., & Ullrich, J. R. (2002). Enumerating and testing conjoint measurement models. Mathematical Social Sciences, 43, 487-505.

Kuhn, T. S. (1961). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. in T. S. Kuhn, (Ed.). (1977). The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago: University of Chicago Press.)

Linacre, J. M. (2011). A user’s guide to WINSTEPS Rasch-Model computer program, v. 3.72.0. Chicago, Illinois: Winsteps.com.

Linacre, J. M. (2013). A user’s guide to FACETS Rasch-Model computer program, v. 3.71.0. Chicago, Illinois: Winsteps.com.

Mandelbrot, B. (2004). The misbehavior of markets. New York: Basic Books.

Massof, R. W., & Ahmadian, L. (2007, July). What do different visual function questionnaires measure? Ophthalmic Epidemiology, 14(4), 198-204.

Moskowitz, M. T., & Dickinson, B. W. (2002). Stochastic resonance in speech recognition: Differentiating between /b/ and /v/. Proceedings of the IEEE International Symposium on Circuits and Systems, 3, 855-858.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Repperger, D. W., & Farris, K. A. (2010, July). Stochastic resonance –a nonlinear control theory interpretation. International Journal of Systems Science, 41(7), 897-907.

Riani, M., & Simonotto, E. (1994). Stochastic resonance in the perceptual interpretation of ambiguous figures: A neural network model. Physical Review Letters, 72(19), 3120-3123.

Smith, R. M., & Plackner, C. (2009). The family approach to assessing fit in Rasch measurement. Journal of Applied Measurement, 10(4), 424-437.

Smith, R. M., & Taylor, P. (2004). Equating rehabilitation outcome scales: Developing common metrics. Journal of Applied Measurement, 5(3), 229-42.

Stenner, A. J., Fisher, W. P., Jr., Stone, M. H., & Burdick, D. S. (2013, August). Causal Rasch models. Frontiers in Psychology: Quantitative Psychology and Measurement, 4(536), 1-14 [doi: 10.3389/fpsyg.2013.00536].

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010, 29 October). Evidence for a collective intelligence factor in the performance of human groups. Science, 330, 686-688.

Woolley, A. W., & Fuchs, E. (2011, September-October). Collective intelligence in the organization of science. Organization Science, 22(5), 1359-1367.

Wu, M. L., Adams, R. J., Wilson, M. R., Haldane, S.A. (2007). ACER ConQuest Version 2: Generalised item response modelling software. Camberwell: Australian Council for Educational Research.

Dispelling Myths about Measurement in Psychology and the Social Sciences

August 27, 2013

Seven common assumptions about measurement and method in psychology and the social sciences stand as inconsistent anomalies in the experience of those who have taken the trouble to challenge them. As evidence, theory, and instrumentation accumulate, will we see a revolutionary break and disruptive change across multiple social and economic levels and areas as a result? Will there be a slower, more gradual transition to a new paradigm? Or will the status quo simply roll on, oblivious to the potential for new questions and new directions? We shall see.

1. Myth: Qualitative data and methods cannot really be integrated with quantitative data and methods because of opposing philosophical assumptions.

Fact: Qualitative methods incorporate a critique of quantitative methods that leads to a more scientific theory and practice of measurement.

2. Myth: Statistics is the logic of measurement.

Fact: Statistics did not emerge as a discipline until the 19th century, while measurement, of course, has been around for millennia. Measurement is modeled at the individual level within a single variable whereas statistics model at the population level between variables. Data are fit to prescriptive measurement models using the Garbage-In, Garbage-Out (GIGO) Principle, while descriptive statistical models are fit to data.

3. Myth: Linear measurement from ordinal test and survey data is impossible.

Fact: Ordinal data have been used as a basis for invariant linear measures for decades.

4. Myth: Scientific laws like Newton’s laws of motion cannot be successfully formulated, tested, or validated in psychology and the social sciences.

Fact: Mathematical laws of human behavior and cognition in the same form as Newton’s laws are formulated, tested, and validated in numerous Rasch model applications.

5. Myth: Experimental manipulations of psychological and social phenomena are inherently impossible or unethical.

Fact: Decades of research across multiple fields have successfully shown how theory-informed interventions on items/indicators/questions can result in predictable, consistent, and substantively meaningful quantitative changes.

6. Myth: “Real” measurement is impossible in psychology and the social sciences.

Fact: Success in predictive theory, instrument calibration, and in maintaining stable units of comparison over time are all evidence supporting the viability of meaningful uniform units of measurement in psychology and the social sciences.

7. Myth: Efficient economic markets can incorporate only manufactured and liquid capital, and property. Human, social, and natural capital, being intangible, have permanent status as market externalities as they cannot be measured well enough to enable accountability, pricing, or transferable representations (common currency instruments).

Fact: The theory and methods necessary for establishing an Intangible Assets Metric System are in hand. What’s missing is the awareness of the scientific, human, social, and economic value that would be returned from the admittedly very large investments that would be required.

References and examples are available in other posts in this blog, in my publications, or on request.