Archive for June, 2018

Evaluating Questionnaires as Measuring Instruments

June 23, 2018

An email came in today asking whether three different short (4- and 5-item) questionnaires could be expected to provide reasonable quality measurement. Here’s my response.

—–

Thanks for raising this question. The questionnaire plainly was not designed to provide data suitable for measurement. Though much can be learned about making constructs measurable from data produced by this kind of questionnaire, “Rasch analysis” cannot magically create a silk purse from a sow’s ear (as the old expression goes). Use Linacre’s (1993) generalizability theory nomograph to see what reliabilities are expected for each subscale, given the numbers of items and rating categories, and applying a conservative estimate of the adjusted standard deviations (1.0 logit, for instance). Convert the reliability coefficients into strata (Fisher, 1992, 2008; Wright & Masters, 1982, pp. 92, 105-106) to make the practical meaning of the precision obtained obvious.

So if you have data, analyze it and compare the expected and observed reliabilities. If the uncertainties are quite different, is that because of targeting issues? But before you do that, ask experts in the area to rank order:

  • the courses by relevance to the job;
  • the evaluation criteria from easy to hard; and
  • the skills/competencies in order of importance to job performance.

Then study the correspondence between the rankings and the calibration results. Where do they converge and diverge? Why? What’s unexpected? What can be learned?

Analyze all of the items in each area (student, employer, instructor) together in Winsteps and study each of the three tables 23.x, setting PRCOMP=S. Remember that the total variance explained is not interpreted simply in terms of “more is better” and that the total variance explained is not as important as the ratio of that variance to the variance in the first contrast (see Linacre, 2006, 2008). If the ratio is greater than 3, the scale is essentially unidimensional (though significant problems may remain to be diagnosed and corrected).

Common practice holds that unexplained variance eigenvalues should be less than 1.5, but this overly simplistic rule of thumb (Chou & Wang, 2010; Raîche, 2005) has been contradicted in practice many times, since, even if one or more eigenvalues are over 1.5, theory may say the items belong to the same construct, and the disattenuated correlations of the measures implied by the separate groups of items (provided in tables 23.x) may still approach 1.00, indicating that the same measures are produced across subscales. See Green (1996) and Smith (1996), among others, for more on this.

If subscales within each of the three groups of items are markedly different in the measures they produce, then separate them in different analyses. If these further analyses reveal still more multidimensionalities, it’s time to go back to the drawing board, given how short these scales are. If you define a plausible scale, study the item difficulty orders closely with one or more experts in the area. If there is serious interest in precision measurement and its application to improved management, and not just a bureaucratic need for data to satisfy empty demands for a mere appearance of quality assessment, then trace the evolution of the construct as it changes from less to more across the items.

What, for instance, is the common theme addressed across the courses that makes them all relevant to job performance? The courses were each created with an intention and they were brought together into a curriculum for a purpose. These intentions and purposes are the raw material of a construct theory. Spell out the details of how the courses build competency in translation.

Furthermore, I imagine that this curriculum, by definition, was set up to be effective in training students no matter who is in the courses (within the constraints of the admission criteria), and no matter which particular challenges relevant to job performance are sampled from the universe of all possible challenges. You will recognize these unexamined and unarticulated assumptions as what need to be explicitly stated as hypotheses informing a model of the educational enterprise. This model transforms implicit assumptions into requirements that are never fully satisfied but can be very usefully approximated.

As I’ve been saying for a long time (Fisher, 1989), please do not accept the shorthand language of references to “the Rasch model”, “Rasch scaling”, “Rasch analysis”, etc. Rasch did not invent the form of these models, which are at least as old as Plato. And measurement is not a function of data analysis. Data provide experimental evidence testing model-based hypotheses concerning construct theories. When explanatory theory corroborates and validates data in calibrated instrumentation, the instrument can be applied at the point of use with no need for data analysis, to produce measures, uncertainty (error) estimates, and graphical fit assessments (Connolly, Nachtman, & Pritchett, 1971; Davis, et al., 2008; Fisher, 2006; Fisher, Kilgore, & Harvey, 1995; Linacre, 1997; many others).

So instead of using those common shorthand phrases, please speak directly to the problem of modeling the situation in order to produce a practical tool for managing it.

Further information is available in the references below.

 

Aryadoust, S. V. (2009). Mapping Rasch-based measurement onto the argument-based validity framework. Rasch Measurement Transactions, 23(1), 1192-3 [http://www.rasch.org/rmt/rmt231.pdf].

Chang, C.-H. (1996). Finding two dimensions in MMPI-2 depression. Structural Equation Modeling, 3(1), 41-49.

Chou, Y. T., & Wang, W. C. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70, 717-731.

Connolly, A. J., Nachtman, W., & Pritchett, E. M. (1971). Keymath: Diagnostic Arithmetic Test. Circle Pines, Minnesota: American Guidance Service. Retrieved 23 June 2018 from https://images.pearsonclinical.com/images/pa/products/keymath3_da/km3-da-pub-summary.pdf

Davis, A. M., Perruccio, A. V., Canizares, M., Tennant, A., Hawker, G. A., Conaghan, P. G. et al. (2008, May). The development of a short measure of physical function for hip OA HOOS-Physical Function Shortform (HOOS-PS): An OARSI/OMERACT initiative. Osteoarthritis Cartilage, 16(5), 551-559.

Fisher, W. P., Jr. (1989). What we have to offer. Rasch Measurement Transactions, 3(3), 72 [http://www.rasch.org/rmt/rmt33d.htm].

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238  [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2006). Survey design recommendations [expanded from Fisher, W. P. Jr. (2000) Popular Measurement, 3(1), pp. 58-59]. Rasch Measurement Transactions, 20(3), 1072-1074 [http://www.rasch.org/rmt/rmt203.pdf].

Fisher, W. P., Jr. (2008). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-1163 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5(1), 3-25.

Green, K. E. (1996). Dimensional analyses of complex data. Structural Equation Modeling, 3(1), 50-61.

Linacre, J. M. (1993). Rasch-based generalizability theory. Rasch Measurement Transactions, 7(1), 283-284; [http://www.rasch.org/rmt/rmt71h.htm].

Linacre, J. M. (1997). Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation State of the Art Reviews, 11(2), 315-324 [http://www.rasch.org/memo60.htm].

Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2(3), 266-83.

Linacre, J. M. (1998). Structure in Rasch residuals: Why principal components analysis? Rasch Measurement Transactions, 12(2), 636 [http://www.rasch.org/rmt/rmt122m.htm].

Linacre, J. M. (2003). PCA: Data variance: Explained, modeled and empirical. Rasch Measurement Transactions, 17(3), 942-943 [http://www.rasch.org/rmt/rmt173g.htm].

Linacre, J. M. (2006). Data variance explained by Rasch measures. Rasch Measurement Transactions, 20(1), 1045 [http://www.rasch.org/rmt/rmt201a.htm].

Linacre, J. M. (2008). PCA: Variance in data explained by Rasch measures. Rasch Measurement Transactions, 22(1), 1164 [http://www.rasch.org/rmt/rmt221j.htm].

Raîche, G. (2005). Critical eigenvalue sizes in standardized residual Principal Components Analysis. Rasch Measurement Transactions, 19(1), 1012 [http://www.rasch.org/rmt/rmt191h.htm].

Schumacker, R. E., & Linacre, J. M. (1996). Factor analysis and Rasch. Rasch Measurement Transactions, 9(4), 470 [http://www.rasch.org/rmt/rmt94k.htm].

Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3(2), 205-31.

Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3(1), 25-40.

Wright, B. D. (1996). Comparing Rasch measurement and factor analysis. Structural Equation Modeling, 3(1), 3-24.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, Illinois: MESA Press.

Advertisements

What is the point of sustainability impact investing?

June 10, 2018

What if the sustainability impact investing problem is not just a matter of judiciously supporting business policies and practices likely to enhance the long term viability of life on earth? What if the sustainability impact investing problem is better conceived in terms of how to create markets that function as self-sustaining ecosystems of diverse forms of economic life?

The crux of the sustainability problem from this living capital metrics point of view is how to create efficient markets for virtuous cycles of productive value creation in the domains of human, social, and natural capital. Mainstream economics deems this an impossible task because its definition of measurement makes trade in these forms of capital unethical and immoral forms of slavery.

But what if there is another approach to measurement? What if this alternative approach is scientific in ways unimagined in mainstream economics? What if this alternative approach has been developing in research and practice in education, psychology, health care, sociology, and other fields for over 90 years? What if there are thousands of peer-reviewed publications supporting its validity and reliability? What if a wide range of commercial firms have been successfully employing this alternative approach to measurement for decades? What if this alternative approach has been found legally and scientifically defensible in ways other approaches have not? What if this alternative approach enables us to be better stewards of our lives together than is otherwise possible?

Put another way, measuring and managing sustainability is fundamentally a problem of harmonizing relationships. What do we need to harmonize our relationships with each other, between our communities and nations, and with the earth? How can we achieve harmonization without forcing conformity to one particular scale? How can we tune the instruments of a sustainability art and science to support as wide a range of diverse ensembles and harmonies as exists in music?

Positive and hopeful answers to these questions follow from the fact that we have at our disposal a longstanding, proven, and advanced art and science of qualitatively rich measurement and instrument calibration. The crux of the message is that this art and science is poised to be the medium in which sustainability impact investing and management fulfills its potential and transforms humanity’s capacities to care for itself and the earth.

The differences between the quality of information that is available, and the quality of information currently in use in sustainability impact investing, are of such huge magnitudes that they can only be called transformative. Love and care are the power behind these transformative differences. Choosing discourse over violence, considerateness for the vulnerabilities we share with others, and care for the unity and sameness of meaning in dialogue are all essential to learning the lesson Diotima taught Socrates in Plato’s Symposium. These lessons can all be brought to bear in creating the information and communications systems we need for sustainable economies.

The current world of sustainability impact investing’s so-called metrics lead to widespread complaints of increased administrative and technical burdens, and resulting distractions that lead away from pursuit of the core social mission. The maxim, “you manage what you measure,” becomes a cynical commentary on red tape and bureaucracy instead of a commendable use of tools fit for purpose.

In contrast with the cumbersome and uninterpretable masses of data that pass for sustainability metrics today, the art and science of measurement establishes the viability and feasibility of efficient markets for human, social, and natural capital. Instead of counting paper clips in mindless accounting exercises, we can instead be learning what comes next in the formative development of a student, a patient, an employee, a firm, a community, or the ecosystem services of watersheds, forests, and fisheries.

And we can moreover support success in those developments by means of information flows that indicate where the biggest per-dollar human, social, and natural capital value returns accrue. Rigorous measurability of those returns will make it possible to price them, to own them, to make property rights legally enforceable, and to thereby align financial profits with the creation of social value. In fact, we could and should set things up so that it will be impossible to financially profit without creating social value. When that kind of system of incentives and rewards is instituted, then the self-sustaining virtuous cycle of a new ecological economy will come to life.

Though the value and originality of the innovations making this new medium possible are huge, in the end there’s really nothing new under the sun. As the French say, “plus ça change, plus c’est la même chose.” Or, as Whitehead put it, philosophically, the innovations in measurement taking hold in the world today are nothing more than additional footnotes to Plato. Contrary to both popular and most expert opinion, it turns out that not only is a moral and scientific art of human measurement possible, Plato’s lessons on how experiences of beauty teach us about meaning provide what may well turn out to be the only way today’s problems of human suffering, social discontent, and environmental degradation will be successfully addressed.

We are faced with a kind of Chinese finger-puzzle: the more we struggle, the more trapped we become. Relaxing into the problem and seeing the historical roots of scientific reasoning in everyday thinking opens our eyes to a new path. Originality is primarily a matter of finding a useful model no one else has considered. A long history of innovations come together to point in a new direction plainly recognizable as a variation on an old theme.

Instead of a modern focus on data and evidence, then, and instead of the postmodern focus on the theory-dependence of data, we are free to take an unmodern focus on how things come into language. The chaotic complexity of that process becomes manageable as we learn to go with the flow of adaptive evolving processes stable enough to support meaningful communication. Information infrastructures in this linguistic context are conceived as ecosystems alive to changeable local situations at the same time they do not compromise continuity and navigability.

We all learn through what we already know, so it is essential that we begin from where we are at. Our first lessons will then be drawn from existing sustainability impact data, using the UN SDG 17 as a guide. These data were not designed from the principles of scientifically rigorous measurement, but instead assume that separately aggregated counts of events, percentages, and physical measures of volume, mass, or time will suffice as measures of sustainability. Things that are easy to count are not, however, likely to work as satisfactory measures. We need to learn from the available data to think again about what data are necessary and sufficient to the task.

The lessons we will learn from the data available today will lead to more meaningful and rigorous measures of sustainability. Connecting these instruments together by making them metrologically traceable to standard units, while also illuminating local unique data patterns, in widely accessible multilevel information infrastructures is the way in which we will together work the ground, plant the seeds, and cultivate new diverse natural settings for innovating sustainable relationships.

 

Measurement and markets

June 3, 2018

My response to a request for discussion topic suggestions from Alain Leplege and David Andrich to be taken up at the Rasch Expert Group meeting in Paris on 26 June:

The role of measurement in reducing economic transaction costs and in establishing legal property rights is well established. The value and importance of measurement is stressed everywhere, in all fields. But where measuring physical, chemical, and biological variables contributes to lower transaction costs and defensible property rights, measuring psychological, social, and environmental variables increases administrative and technical burdens with no impact at all on property rights. Why is this?

Furthermore, when physical, chemical, and biological variables are objectively measurable, no one develops their own instruments, units, internal measurement systems, or the things measured by those systems. Instead, they purchase those tools and products in open markets.

But when psychological, social, and environmental variables are objectively measured, as they have been for many decades, everyone still assumes they must develop their own instruments, units, internal measurement systems, and the things measured by those systems, instead of purchasing those tools and products in open markets. Why is this?

I propose that the answers to both these questions follow from two widely assumed misconceptions about markets and measurement.

The first misconception concerns how markets are formed. As explained by Miller and O’Leary (2007, p. 721):

“Markets are not spontaneously generated by the exchange activity of buyers and sellers. Rather, skilled actors produce institutional arrangements, the rules, roles and relationships that make market exchange possible. The institutions define the market, rather than the reverse.“

North (1981, pp. 18-19, 36), one of the founders of the new institutional economics, elaborates further:

“…without some form of measurement, property rights cannot be established nor exchange take place.”

“One must be able to measure the quantity of a good in order for it to be exclusive property and to have value in exchange. Where measurement costs are very high, the good will be a common property resource. The technology of measurement and the history of weights and measures is a crucial part of economic history since as measurement costs were reduced the cost of transacting was reduced.“

Benham and Benham (2000, p. 370) concur:

“Economic theory suggests that changes in transaction costs have a first-order impact on the production frontier. Lower transaction costs mean more trade, greater specialization, changes in production costs, and increased output.”

The second misconception, concerning measurement, stems from the assumption that the widely used incomplete and insufficient methods based in True Score Theory are the state of the art, and that their associated reductionist and immoral commodization of people is unavoidable. As is well known to the Rasch expert group attendees, the state of the art in measurement offers a wealth of advantages inaccessible to True Score Theory. One of these is the insufficiently elaborated opportunities available for nonreductionist and moral commoditization of the constructs measured, not people themselves.

It seems plain that many of today’s problems of human suffering, social discontent, and environmental degradation could possibly be more effectively addressed by means of systematic and deliberate efforts aimed at using improved measurement methods to lower transaction costs, establish property rights, and create efficient markets supporting advanced innovations for improving the quality and quantity of intangible assets. Efforts in this direction have connected Rasch psychometrics with metrology (Mari & Wilson, 2014; Pendrill & Fisher, 2015; Fisher & Stenner, 2016), with the historical interweaving of science and the economy (Fisher, 2002, 2007, 2009, 2010, 2012, etc), and are being applied to the development of a new class of social impact bonds (see https://www.aldcpartnership.com/#/cases/financing-the-future).

What feedback, questions, and comments might the expert group attendees have in response to these efforts?

Additional references available on request.

Benham, A., & Benham, L. (2000). Measuring the costs of exchange. In C. Ménard (Ed.), Institutions, contracts and organizations: Perspectives from new institutional economics (pp. 367-375). Cheltenham, UK: Edward Elgar.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-734.

North, D. C. (1981). Structure and change in economic history. New York: W. W. Norton & Co.

On the practical value of rigorous modeling

June 1, 2018

What is the practical value of modeling real things in the world in terms requiring separable parameters?

If the parameters are separable, the stage is set for the validation of a model that is relevant to and applies to any individual person or challenge belonging to the infinite populations of those classes of all possible people and challenges.

Parameter separation does not automatically translate into representations of that conjoint relationship, though. Meaningful and consistent variation in responses requires items written to provoke answers that cohere across respondents and across the questions asked.

In addition, enough questions have to be asked to drive down uncertainty relative to the variation. Response patterns can be reproduced meaningfully only when there is more variation than uncertainty. Precision and reliability are functions of that ratio.

But when reliable and meaningful parameter separation is obtained, the laboratory model represents something real in the world. Moreover, the model stands for the object of interest in a way that persists no matter which particular people or challenges are involved.

This is where the practical value kicks in. This is what makes it possible to export the laboratory model into the real world. The specific microcosm of relationships isolated and found reproducible in research is useful and meaningful to the extent that those relationships take at least roughly the same form in the world. Instead of managing the specific indicators that are counted up in the concrete observations, it becomes possible to manage the generic object of interest, abstractly conceived. Adaptively selecting the relevant indicators according to their practical relevance on the ground in real world applications has the practical consequence of unifying measurement, management, and accountability in a single sphere of action.

Plainly there’s a lot more that needs to be said about this.