Archive for the ‘Rasch’ Category

IMEKO Joint Symposium in St. Petersburg, Russia, 2-5 July 2019

June 26, 2019

The IMEKO Joint Symposium will be next week, 2-5 July, at the Original Sokos Hotel Olympia Garden, located at Batayskiy Pereulok, 3А, in St. Petersburg, Russia. Kudos to Kseniia Sapozhnikova, Giovanni Rossi, Eric Benoit, and the organizing committee for putting together such an impressive program, which is posted at: https://imeko19-spb.org/wp-content/uploads/2019/06/Program-of-the-Symposium.pdf

Presentations on measurement across the sciences from metrology engineers and psychometricians from around the world will include: Andrich, Cavanagh, Fitkov-Norris, Huang, Mari, Melin, Nguyen, Oon, Powers, Salzberger, Wilson, and multiple other co-authors, including Adams, Cano, Maul, Pendrill, and more.

For background on this rapidly developing new conversation on measurement across the sciences, see the references listed at bottom below. The late Ludwig Finkelstein, editor of IMEKO’s Measurement journal from 1982 to 2000, was a primary instigator of work in this area. At the 2010 Joint Symposium he co-hosted in London, Finkelstein said: “It is increasingly recognised that the wide range and diverse applications of measurement are based on common logical and philosophical principles and share common problems” (Finkelstein, 2010, p. 2). The IMEKO Joint Symposium continues to advance in the direction foreseen by Finkelstein.

Topics to be addressed include a round table discussion on the topic “Terminology issues related to expanding boundaries of measurements” chaired by Mari and Chunovkina.

Paper titles include:

Andrich on “Exemplifying natural science measurement in the social sciences with Rasch measurement theory”

Benoit, et al. on “Musical instruments for the measurement of autism sensory disorders”

Budylina and Danilov on “Methods to ensure the reliability of measurements in the age of Industry 4.0”

Cavanagh, Asano-Cavanagh, and Fisher on “Natural semantic metalanguage as an approach to measuring meaning”

Crenna and Rossi on “Squat biomechanics in weightlifting: Foot attitude effects”

Fisher, Pendrill, Lips da Cruz, and Felin on “Why metrology? Fair dealing and efficient markets for the UN SDGs”

Fisher and Wilson on “The BEAR Assessment System Software as a platform for developing and applying UN SDG metrics”

Fitkov-Norris and Yeghiazarian on “Is context the hidden spanner in the works of educational measurement: Exploring the impact of context on mode of learning preferences”

Gavrilenkov, et al. on “Multicriteria approach to design of strain gauge force transducers”

Grednovskaya, et al. on “Measuring non-physical quantities in the procedures of philosophical practice”

Huang, Oon, and Fisher on “Coherence in measuring student evaluation of teaching: A new paradigm”

Katkov on “The status of and prospects for development of voltage quantum standards”

Kneller and Fayans on “Solving interdisciplinary tasks: The challenge and the ways to surmount it”

Kostromina and Gnedykh on “Problems and prospects of complex psychological phenomena measurement”

Lips da Cruz, Fisher, Pendrill, and Felin on “Accelerating the realization of the UN SDGs through metrological multi-stakeholder interoperability”

Lyubimtsev, et al. on “Measuring systems designed for working with living organisms as biosensors: Features of their metrological maintenance”

Mari, Chunovkina, and Ehrlich on “The complex concept of quantity in the past and (possibly) the future of the International Vocabulary of Metrology”

Mari, Maul, and Wilson on “Can there be one meaning of ‘measurement’ across the sciences?”

Melin, Pendrill, Cano, and the EMPIR NeuroMET 15HLT04 Consortium on “Towards patient-centred cognition metrics”

Morrison and Fisher on “Measuring for management in Science, Technology, Engineering, and Mathematics learning ecosystems”

Nguyen on “The feasibility of using an international common reading progression to measure reading across languages: A case study of the Vietnamese language”

Nguyen, Nguyen, and Adams on “Assessment of the generic problem-solving construct across different contexts”

Oon, Hoi-Ka, and Fisher on “Metrologically coherent assessment for learning: What, why, and how”

Pandurevic, et al. on “Methods for quantitative evaluation of force and technique in competitive sport climbing”

Pavese on “Musing on extreme quantity values in physics and the problem of removing infinity”

Powers and Fisher on “Advances in modelling visual symptoms and visual skills”

Salzberger, Cano, et al. on “Addressing traceability in social measurement: Establishing a common metric for dependence”

Sapozhnikova, et al. on “Music and growl of a lion: Anything in common? Measurement model optimized with the help of AI will answer”

Soratto, Nunes, and Cassol on “Legal metrological verification in health area in Brazil”

Wilson and Dulhunty on “Interpreting the relationship between item difficulty and DIF: Examples from educational testing”

Wilson, Mari, and Maul on “The status of the concept of reference object in measurement in the human sciences compared to the physical sciences”

Background References

Finkelstein, L. (1975). Representation by symbol systems as an extension of the concept of measurement. Kybernetes, 4(4), 215-223.

Finkelstein, L. (2003, July). Widely, strongly and weakly defined measurement. Measurement, 34(1), 39-48(10).

Finkelstein, L. (2005). Problems of measurement in soft systems. Measurement, 38(4), 267-274.

Finkelstein, L. (2009). Widely-defined measurement–An analysis of challenges. Measurement: Concerning Foundational Concepts of Measurement Special Issue Section (L. Finkelstein, Ed.), 42(9), 1270-1277.

Finkelstein, L. (2010). Measurement and instrumentation science and technology-the educational challenges. Journal of Physics Conference Series, 238, doi:10.1088/1742-6596/238/1/012001.

Fisher, W. P., Jr. (2009). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement: Concerning Foundational Concepts of Measurement Special Issue (L. Finkelstein, Ed.), 42(9), 1278-1287.

Mari, L. (2000). Beyond the representational viewpoint: A new formalization of measurement. Measurement, 27, 71-84.

Mari, L., Maul, A., Irribara, D. T., & Wilson, M. (2016, March). Quantities, quantification, and the necessary and sufficient conditions for measurement. Measurement, 100, 115-121. Retrieved from http://www.sciencedirect.com/science/article/pii/S0263224116307497

Mari, L., & Wilson, M. (2014, May). An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315-327. Retrieved from http://www.sciencedirect.com/science/article/pii/S0263224114000645

Pendrill, L. (2014, December). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33. Retrieved from http://www.tandfonline.com/doi/abs/10.1080/19315775.2014.11721702

Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55. doi: http://dx.doi.org/10.1016/j.measurement.2015.04.010

Pendrill, L., & Petersson, N. (2016). Metrology of human-based and other qualitative measurements. Measurement Science and Technology, 27(9), 094003. Retrieved from https://doi.org/10.1088/0957-0233/27/9/094003

Wilson, M. R. (2013). Using the concept of a measurement system to characterize measurement models used in psychometrics. Measurement, 46, 3766-3774. Retrieved from http://www.sciencedirect.com/science/article/pii/S0263224113001061

Wilson, M., & Fisher, W. (2016). Preface: 2016 IMEKO TC1-TC7-TC13 Joint Symposium: Metrology across the Sciences: Wishful Thinking? Journal of Physics Conference Series, 772(1), 011001. Retrieved from http://iopscience.iop.org/article/10.1088/1742-6596/772/1/011001/pdf

Wilson, M., & Fisher, W. (2018). Preface of special issue, Metrology across the Sciences: Wishful Thinking? Measurement, 127, 577.

Wilson, M., & Fisher, W. (2019). Preface of special issue, Psychometric Metrology. Measurement, 145, 190.

Advertisements

Psychology and the social sciences: An atheoretical, scattered, and disconnected body of research

February 16, 2019

A new article in Nature Human Behaviour (NHB) points toward the need for better theory and more rigorous mathematical models in psychology and the social sciences (Muthukrishna & Henrich, 2019). The authors rightly say that the lack of an overarching cumulative theoretical framework makes it very difficult to see whether new results fit well with previous work, or if something surprising has come to light. Mathematical models are especially emphasized as being of value in specifying clear and precise expectations.

The point that the social sciences and psychology need better theories and models is painfully obvious. But there are in fact thousands of published studies and practical real world applications that not only provide, but indeed often surpass, the kinds of predictive theories and mathematical models called for in the NHB article. The article not only makes no mention of any of this work, its argument is framed entirely in a statistical context instead of the more appropriate context of measurement science.

The concept of reliability provides an excellent point of entry. Most behavioral scientists think of reliability statistically, as a coefficient with a positive numeric value usually between 0.00 and 1.00. The tangible sense of reliability as indicating exactly how predictable an outcome is does not usually figure in most researchers’ thinking. But that sense of the specific predictability of results has been the focus of attention in social and psychological measurement science for decades.

For instance, the measurement of time is reliable in the sense that the position of the sun relative to the earth can be precisely predicted from geographic location, the time of day, and the day of the year. The numbers and words assigned to noon time are closely associated with the Sun being at the high point in the sky (though there are political variations by season and location across time zones).

That kind of a reproducible association is rarely sought in psychology and the social sciences, but it is far from nonexistent. One can discern different degrees to which that kind of association is included in models of measured constructs. Though most behavioral research doesn’t mention the connection between linear amounts of a measured phenomenon and a reproducible numeric representation of it (level 0), quite a significant body of work focuses on that connection (level 1). The disappointing thing about that level 1 work is that the relentless obsession with statistical methods prevents most researchers from connecting a reproducible quantity with a single expression of it in a standard unit, and with an associated uncertainty term (level 2). That is, level 1 researchers conceive measurement in statistical terms, as a product of data analysis. Even when results across data sets are highly correlated and could be equated to a common metric, level 1 researchers do not leverage that source of potential value for simplified communication and accumulated comparability.

And then, for their part, level 2 researchers usually do not articulate theories about the measured constructs, by augmenting the mathematical data model with an explanatory model predicting variation (level 3). Level 2 researchers are empirically grounded in data, and can expand their network of measures only by gathering more data and analyzing it in ways that bring it into their standard unit’s frame of reference.

Level 3 researchers, however, have come to see what makes their measures tick. They understand the mechanisms that make their questions vary. They can write new questions to their theoretical specifications, test those questions by asking them of a relevant sample, and produce the predicted calibrations. For instance, reading comprehension is well established to be a function of the difference between a person’s reading ability and the complexity of the text they encounter (see articles by Stenner in the list below). We have built our entire educational system around this idea, as we deliberately introduce children first to the alphabet, then to the most common words, then to short sentences, and then to ever longer and more complicated text. But stating the construct model, testing it against data, calibrating a unit to which all tests and measures can be traced, and connecting together all the books, articles, tests, curricula, and students is a process that began (in English and Spanish) only in the 1980s. The process still is far from finished, and most reading research still does not use the common metric.

In this kind of theory-informed context, new items can be automatically generated on the fly at the point of measurement. Those items and inferences made from them are validated by the consistency of the responses and the associated expression of the expected probability of success, agreement, etc. The expense of constant data gathering and analysis can be cut to a very small fraction of what it is at levels 0-2.

Level 3 research methods are not widely known or used, but they are not new. They are gaining traction as their use by national metrology institutes globally grows. As high profile critiques of social and psychological research practices continue to emerge, perhaps more attention will be paid to this important body of work. A few key references are provided below, and virtually every post in this blog pertains to these issues.

References

Baghaei, P. (2008). The Rasch model as a construct validation tool. Rasch Measurement Transactions, 22(1), 1145-6 [http://www.rasch.org/rmt/rmt221a.htm].

Bergstrom, B. A., & Lunz, M. E. (1994). The equivalence of Rasch item calibrations and ability estimates across modes of administration. In M. Wilson (Ed.), Objective measurement: Theory into practice, Vol. 2 (pp. 122-128). Norwood, New Jersey: Ablex.

Cano, S., Pendrill, L., Barbic, S., & Fisher, W. P., Jr. (2018). Patient-centred outcome metrology for healthcare decision-making. Journal of Physics: Conference Series, 1044, 012057.

Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct validation. Measurement & Evaluation in Counseling & Development, 43(2), 121-149.

Embretson, S. E. (2010). Measuring psychological constructs: Advances in model-based approaches. Washington, DC: American Psychological Association.

Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.

Fischer, G. H. (1983). Logistic latent trait models with linear constraints. Psychometrika, 48(1), 3-26.

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238 [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2008). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-1163 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., & Stenner, A. J. (2016). Theory-based metrological traceability in education: A reading measurement network. Measurement, 92, 489-496.

Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37(4), 827-833.

Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139-64.

Hobart, J. C., Cano, S. J., Zajicek, J. P., & Thompson, A. J. (2007). Rating scales as outcome measures for clinical trials in neurology: Problems, solutions, and recommendations. Lancet Neurology, 6, 1094-1105.

Irvine, S. H., Dunn, P. L., & Anderson, J. D. (1990). Towards a theory of algorithm-determined cognitive test construction. British Journal of Psychology, 81, 173-195.

Kline, T. L., Schmidt, K. M., & Bowles, R. P. (2006). Using LinLog and FACETS to model item components in the LLTM. Journal of Applied Measurement, 7(1), 74-91.

Lunz, M. E., & Linacre, J. M. (2010). Reliability of performance examinations: Revisited. In M. Garner, G. Engelhard, Jr., W. P. Fisher, Jr. & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 1 (pp. 328-341). Maple Grove, MN: JAM Press.

Mari, L., & Wilson, M. (2014). An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315-327.

Markward, N. J., & Fisher, W. P., Jr. (2004). Calibrating the genome. Journal of Applied Measurement, 5(2), 129-141.

Maul, A., Mari, L., Torres Irribarra, D., & Wilson, M. (2018). The quality of measurement results in terms of the structural features of the measurement process. Measurement, 116, 611-620.

Muthukrishna, M., & Henrich, J. (2019). A problem in theory. Nature Human Behaviour, 1-9.

Obiekwe, J. C. (1999, August 1). Application and validation of the linear logistic test model for item difficulty prediction in the context of mathematics problems. Dissertation Abstracts International: Section B: The Sciences & Engineering, 60(2-B), 0851.

Pendrill, L. (2014). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33.

Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.

Pendrill, L., & Petersson, N. (2016). Metrology of human-based and other qualitative measurements. Measurement Science and Technology, 27(9), 094003.

Sijtsma, K. (2009). Correcting fallacies in validity, reliability, and classification. International Journal of Testing, 8(3), 167-194.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107-120.

Stenner, A. J. (2001). The necessity of construct theory. Rasch Measurement Transactions, 15(1), 804-5 [http://www.rasch.org/rmt/rmt151q.htm].

Stenner, A. J., Fisher, W. P., Jr., Stone, M. H., & Burdick, D. S. (2013). Causal Rasch models. Frontiers in Psychology: Quantitative Psychology and Measurement, 4(536), 1-14.

Stenner, A. J., & Horabin, I. (1992). Three stages of construct definition. Rasch Measurement Transactions, 6(3), 229 [http://www.rasch.org/rmt/rmt63b.htm].

Stenner, A. J., Stone, M. H., & Fisher, W. P., Jr. (2018). The unreasonable effectiveness of theory based instrument calibration in the natural sciences: What can the social sciences learn? Journal of Physics Conference Series, 1044(012070).

Stone, M. H. (2003). Substantive scale construction. Journal of Applied Measurement, 4(3), 282-297.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wilson, M. R. (2013). Using the concept of a measurement system to characterize measurement models used in psychometrics. Measurement, 46, 3766-3774.

Wright, B. D., & Stone, M. H. (1979). Chapter 5: Constructing a variable. In Best test design: Rasch measurement (pp. 83-128). Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/measess/me-all.pdf].

Wright, B. D., Stone, M., & Enos, M. (2000). The evolution of meaning in practice. Rasch Measurement Transactions, 14(1), 736 [http://www.rasch.org/rmt/rmt141g.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

New Ideas on How to Realize the Purpose of Capital

September 20, 2018

I’d like to offer the following in reply to James Militzer, at https://nextbillion.net/deciphering-emersons-tears-time-impact-investing-lower-expectations/.

Rapid advances toward impact investing’s highest goals of social transformation are underway in quiet technical work being done in places no one is looking. That work shares Jed Emerson’s sentiments expressed at the 2017 Social Capital Markets conference, as he is quoted in Militzer’s NextBillion.net posting, that “The purpose of capital is to advance a more progressively free and just experience of life for all.” And he is correct in what Militzer reported he said the year before, that we need a “real, profound critique of current practices within financial capitalism,” one that would “require real change in our own behavior aside from adding a few funds to our portfolios here or augmenting a reporting process there.”

But the efforts he and others are making toward fulfilling that purpose and articulating that critique are incomplete, insufficient, and inadequate. Why? How? Language is the crux of the matter, and the issues involved are complex and technical. The challenge, which may initially seem simplistic or naive, is how to bring human, social, and environmental values into words. Not just any words, but meaningful words in a common language. What is most challenging is that this language, like any everyday language, has to span the range from abstract theoretical ideals to concrete local improvisations.

That means it cannot be like our current languages for expressing human, social, and environmental value. If we are going to succeed in aligning those forms of value with financial value, we have a lot of work to do.

Though there is endless talk of metrics for managing sustainable impacts, and though the importance of these metrics for making sustainability manageable is also a topic of infinite discussion, almost no one takes the trouble to seek out and implement the state of the art in measurement science. This is a crucial way, perhaps the most essential way, in which we need to criticize current practices within financial capitalism and change our behaviors. Oddly, almost no one seems to have thought of that.

That is, one of the most universally unexamined assumptions of our culture is that numbers automatically stand for quantities. People who analyze numeric data are called quants, and all numeric data analysis is referred to as quantitative. That is the case, but almost none of these quants and quantitative methods involve actually defining, modeling, identifying, evaluating, or applying an substantive unit of something real in the world that can be meaningfully represented by numbers.

There is, of course, an extensive and longstanding literature on exactly this science of measurement. It has been a topic of research, philosophy, and practical applications for at least 90 years, going back to the work of Thurstone at the University of Chicago in the 1920s. That work continued at the University of Chicago with Rasch’s visit there in 1960, with Wright’s adoption and expansion of Rasch’s theory and methods, and with the further work done by Wright’s students and colleagues in the years since.

Most importantly, over the last ten years, metrologists, the physicists and engineers who maintain and improve the SI units, the metric system, have taken note of what’s been going on in research and practice involving the approaches to measurement developed by Rasch, Wright, and their students and colleagues (for just two of many articles in this area, see here and here). The most recent developments in this new metrology include

(a) initiatives at national metrology institutes globally (Sweden and the UK, Portugal, Ukraine, among others) to investigate potentials for a new class of unit standards;

(b) a special session on this topic at the International Measurement Confederation (IMEKO) World Congress in Belfast on 5 September 2018;

(c) the Journal of Physics Conference Series proceedings of the 2016 IMEKO Joint Symposium hosted by Mark Wilson and myself at UC Berkeley;

(d) the publication of a 2017 book on Ben Wright edited by Mark Wilson and myself in Springer’s Series on Measurement Science and Technology; and

(e) the forthcoming October 2018 special issue of Elsevier’s Measurement journal edited by Wilson and myself, and a second one currently in development.

There are profound differences between today’s assumptions about measurement and how a meaningful art and science of precision measurement proceeds. What passes for measurement in today’s sustainability economics and accounting are counts, percentages, and ratings. These merely numeric metrics do not stand for anything that adds up the way they do. In fact, it’s been repeatedly demonstrated over many years that these kinds of metrics measure in a unit that changes size depending on who or what is measured, who is measuring, and what tool is used to measure. What makes matters even worse is that the numbers are usually taken to be perfectly precise, as uncertainty ranges, error terms, and confidence intervals are only sporadically provided and are usually omitted.

Measurement is not primarily a matter of data analysis. Measurement requires calibrated instruments that can be read as standing for a given amount of something that stays the same, within the uncertainty range, no matter who is measuring, no matter what or who is measured, and no matter what tool is used. This is, of course, quite an accomplishment when it can be achieved, but it is not impossible and has been put to use in large scale practical ways for several decades (for instance, see here, here, and here). Universally accessible instruments calibrated to common unit standards are what make society in general, and markets in particular, efficient in the way of projecting distributed network effects, turning communities into massively parallel stochastic computers (as W. Brian Arthur put it on p. 6 of his 2014 book, Complexity Economics).

These are not unexamined assumptions or overly ideal theoretical demands. They are pragmatic ways of adapting to emergent patterns in various kinds of data that have repeatedly been showing themselves around the world for decades. Our task is to literally capitalize on these nonhuman forms of life by creating multilevel, complex ecosystems of relationships with them, letting them be what they are in ways that also let us represent ourselves to each other. (Emerson quotes Bruno Latour to this effect on page 136 in his new book, The Purpose of Capital; those familiar with my work will know I’ve been reading and citing Latour since the early 1980s).

So it seems to me that, however well-intentioned those promoting impact investing may be, there is little awareness of just how profound and sweeping the critique of current practices needs to be, or of just how much our own behaviors are going to have to change. There are, however, truly significant reasons to be optimistic and hopeful. The technical work being done in measurement and metrology points toward possibilities for extending everyday language into a pragmatic idealism that does not require caving in to either varying local circumstances or to authoritarian dictates.

The upside of the situation is that, as so often happens in the course of human history, this critique and the associated changes are likely to have that peculiar quality captured in the French expression, “plus ça change, plus c’est la même chose” (the more things change, the more they stay the same). The changes in process are transformative, but will also be recognizable repetitions of human scale patterns.

In sum, what we are doing is tuning the instruments of the human, social, and environmental sciences to better harmonize relationships. Just as jazz, folk, and world music show that creative improvisation is not constrained by–but is facilitated by–tuning standards and high tech solutions, so, too, can we make that the case in other areas.

For instance, in my presentation at the IMEKO World Congress in Belfast on 5 September, I showed that the integration of beauty and meaning we have within our grasp reiterates principles that date back to Plato. The aesthetics complement the mathematics, with variations on the same equations being traceable from the Pythagorean theorem to Newton’s laws to Rasch’s models for measurement (see, for instance, Fisher & Stenner, 2013). In many ways, the history of science and philosophy continues to be a footnote to Plato.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

 

Current events in metrology for fun, profitable, and self-sustaining sustainability impacts

September 18, 2018

At the main event I attended last week at the Global Climate Action Summit in San Francisco, the #giveyouthachance philanthropic gathering at the Aquarium of the Bay, multiple people independently spoke to aligning social and environmental values with financial values, and explicitly stated that economic growth does not automatically entail environmental degradation.

As my new buddy David Traub (introduced as a consequence of the New Algorithm event in Stockholm in June with Angelica Lips da Cruz) was the MC, he put me on the program at the last minute, and gave me five minutes to speak my piece in a room of 30 people or so. A great point of departure was opened up when Carin Winter of MissionBe.org spoke to her work in mindfulness education and led a guided meditation. So I conveyed the fact that the effects of mindfulness practice are rigorously measurable, and followed that up with the analogy from music (tuning instruments to harmonize relationships),  with the argument against merely shouldering the burden of costs because it is the right thing to do, with the counter-argument for creating efficient competitive markets for sustainable impacts, and with info on the previous week’s special session on social and psychological metrology at IMEKO in Belfast. It appeared that the message of metrology as a means for making sustainability self-sustaining, fun, and profitable got through!

Next up: Unify.Earth has developed their own new iteration on blockchain, which will be announced Monday, 24 September, at the UN SDG Media Center (also see here) during the World Economic Forum’s Sustainable Development Impact Summit. The UEX (Unify Earth Exchange) fills the gap for human capital stocks left by the Universal Commons‘ exclusive focus on social and natural capital.

So I’ve decided to go to NY and have booked my travel.

Back in February, Angelica Lips da Cruz recounted saying six months before that it would take two years to get to where we were at that time. Now another seven months have passed and I am starting to feel that the acceleration is approaching Mach 1! At this rate, it’ll be the speed of light in the next six months….

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Evaluating Questionnaires as Measuring Instruments

June 23, 2018

An email came in today asking whether three different short (4- and 5-item) questionnaires could be expected to provide reasonable quality measurement. Here’s my response.

—–

Thanks for raising this question. The questionnaire plainly was not designed to provide data suitable for measurement. Though much can be learned about making constructs measurable from data produced by this kind of questionnaire, “Rasch analysis” cannot magically create a silk purse from a sow’s ear (as the old expression goes). Use Linacre’s (1993) generalizability theory nomograph to see what reliabilities are expected for each subscale, given the numbers of items and rating categories, and applying a conservative estimate of the adjusted standard deviations (1.0 logit, for instance). Convert the reliability coefficients into strata (Fisher, 1992, 2008; Wright & Masters, 1982, pp. 92, 105-106) to make the practical meaning of the precision obtained obvious.

So if you have data, analyze it and compare the expected and observed reliabilities. If the uncertainties are quite different, is that because of targeting issues? But before you do that, ask experts in the area to rank order:

  • the courses by relevance to the job;
  • the evaluation criteria from easy to hard; and
  • the skills/competencies in order of importance to job performance.

Then study the correspondence between the rankings and the calibration results. Where do they converge and diverge? Why? What’s unexpected? What can be learned?

Analyze all of the items in each area (student, employer, instructor) together in Winsteps and study each of the three tables 23.x, setting PRCOMP=S. Remember that the total variance explained is not interpreted simply in terms of “more is better” and that the total variance explained is not as important as the ratio of that variance to the variance in the first contrast (see Linacre, 2006, 2008). If the ratio is greater than 3, the scale is essentially unidimensional (though significant problems may remain to be diagnosed and corrected).

Common practice holds that unexplained variance eigenvalues should be less than 1.5, but this overly simplistic rule of thumb (Chou & Wang, 2010; Raîche, 2005) has been contradicted in practice many times, since, even if one or more eigenvalues are over 1.5, theory may say the items belong to the same construct, and the disattenuated correlations of the measures implied by the separate groups of items (provided in tables 23.x) may still approach 1.00, indicating that the same measures are produced across subscales. See Green (1996) and Smith (1996), among others, for more on this.

If subscales within each of the three groups of items are markedly different in the measures they produce, then separate them in different analyses. If these further analyses reveal still more multidimensionalities, it’s time to go back to the drawing board, given how short these scales are. If you define a plausible scale, study the item difficulty orders closely with one or more experts in the area. If there is serious interest in precision measurement and its application to improved management, and not just a bureaucratic need for data to satisfy empty demands for a mere appearance of quality assessment, then trace the evolution of the construct as it changes from less to more across the items.

What, for instance, is the common theme addressed across the courses that makes them all relevant to job performance? The courses were each created with an intention and they were brought together into a curriculum for a purpose. These intentions and purposes are the raw material of a construct theory. Spell out the details of how the courses build competency in translation.

Furthermore, I imagine that this curriculum, by definition, was set up to be effective in training students no matter who is in the courses (within the constraints of the admission criteria), and no matter which particular challenges relevant to job performance are sampled from the universe of all possible challenges. You will recognize these unexamined and unarticulated assumptions as what need to be explicitly stated as hypotheses informing a model of the educational enterprise. This model transforms implicit assumptions into requirements that are never fully satisfied but can be very usefully approximated.

As I’ve been saying for a long time (Fisher, 1989), please do not accept the shorthand language of references to “the Rasch model”, “Rasch scaling”, “Rasch analysis”, etc. Rasch did not invent the form of these models, which are at least as old as Plato. And measurement is not a function of data analysis. Data provide experimental evidence testing model-based hypotheses concerning construct theories. When explanatory theory corroborates and validates data in calibrated instrumentation, the instrument can be applied at the point of use with no need for data analysis, to produce measures, uncertainty (error) estimates, and graphical fit assessments (Connolly, Nachtman, & Pritchett, 1971; Davis, et al., 2008; Fisher, 2006; Fisher, Kilgore, & Harvey, 1995; Linacre, 1997; many others).

So instead of using those common shorthand phrases, please speak directly to the problem of modeling the situation in order to produce a practical tool for managing it.

Further information is available in the references below.

 

Aryadoust, S. V. (2009). Mapping Rasch-based measurement onto the argument-based validity framework. Rasch Measurement Transactions, 23(1), 1192-3 [http://www.rasch.org/rmt/rmt231.pdf].

Chang, C.-H. (1996). Finding two dimensions in MMPI-2 depression. Structural Equation Modeling, 3(1), 41-49.

Chou, Y. T., & Wang, W. C. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70, 717-731.

Connolly, A. J., Nachtman, W., & Pritchett, E. M. (1971). Keymath: Diagnostic Arithmetic Test. Circle Pines, Minnesota: American Guidance Service. Retrieved 23 June 2018 from https://images.pearsonclinical.com/images/pa/products/keymath3_da/km3-da-pub-summary.pdf

Davis, A. M., Perruccio, A. V., Canizares, M., Tennant, A., Hawker, G. A., Conaghan, P. G. et al. (2008, May). The development of a short measure of physical function for hip OA HOOS-Physical Function Shortform (HOOS-PS): An OARSI/OMERACT initiative. Osteoarthritis Cartilage, 16(5), 551-559.

Fisher, W. P., Jr. (1989). What we have to offer. Rasch Measurement Transactions, 3(3), 72 [http://www.rasch.org/rmt/rmt33d.htm].

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238  [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2006). Survey design recommendations [expanded from Fisher, W. P. Jr. (2000) Popular Measurement, 3(1), pp. 58-59]. Rasch Measurement Transactions, 20(3), 1072-1074 [http://www.rasch.org/rmt/rmt203.pdf].

Fisher, W. P., Jr. (2008). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-1163 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5(1), 3-25.

Green, K. E. (1996). Dimensional analyses of complex data. Structural Equation Modeling, 3(1), 50-61.

Linacre, J. M. (1993). Rasch-based generalizability theory. Rasch Measurement Transactions, 7(1), 283-284; [http://www.rasch.org/rmt/rmt71h.htm].

Linacre, J. M. (1997). Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation State of the Art Reviews, 11(2), 315-324 [http://www.rasch.org/memo60.htm].

Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2(3), 266-83.

Linacre, J. M. (1998). Structure in Rasch residuals: Why principal components analysis? Rasch Measurement Transactions, 12(2), 636 [http://www.rasch.org/rmt/rmt122m.htm].

Linacre, J. M. (2003). PCA: Data variance: Explained, modeled and empirical. Rasch Measurement Transactions, 17(3), 942-943 [http://www.rasch.org/rmt/rmt173g.htm].

Linacre, J. M. (2006). Data variance explained by Rasch measures. Rasch Measurement Transactions, 20(1), 1045 [http://www.rasch.org/rmt/rmt201a.htm].

Linacre, J. M. (2008). PCA: Variance in data explained by Rasch measures. Rasch Measurement Transactions, 22(1), 1164 [http://www.rasch.org/rmt/rmt221j.htm].

Raîche, G. (2005). Critical eigenvalue sizes in standardized residual Principal Components Analysis. Rasch Measurement Transactions, 19(1), 1012 [http://www.rasch.org/rmt/rmt191h.htm].

Schumacker, R. E., & Linacre, J. M. (1996). Factor analysis and Rasch. Rasch Measurement Transactions, 9(4), 470 [http://www.rasch.org/rmt/rmt94k.htm].

Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3(2), 205-31.

Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3(1), 25-40.

Wright, B. D. (1996). Comparing Rasch measurement and factor analysis. Structural Equation Modeling, 3(1), 3-24.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, Illinois: MESA Press.

Excellent articulation of the rationale for living capital metrics 

November 2, 2017

I just found the best analysis of today’s situation I’ve seen yet. And it explicitly articulates and substantiates all my reasons for doing the work I’m doing. Wonderful to have this independent source of validation.

The crux of the problem is spelled out at the end of the article, where the degree of polarizing opposition is so extreme that standards of truth and evidence are completely compromised. My point is that the fact will remain, however, that everyone still uses language, and language still requires certain connections between concepts, words, and things to function. Continuing to use language in everyday functions in ways that assume a common consensus on meaningful reference may eventually come to be unbearably inconsistent with the way language is used politically, creating a social vacuum that will be filled by a new language capable of restoring the balance of meaning in the word-concept-thing triangles.

As is repeatedly argued in this blog, my take is that what we are witnessing is language restructuring itself to incorporate new degrees of complexity at a general institutional, world historic level. The falsehoods of our contemporary institutional definitions of truth and fact are rooted in the insufficiencies of the decision making methods and tools widely used in education, health care, government, business, etc. The numbers called measures are identified using methods that almost universally ignore the gifts of self-organized meaning that offer themselves in the structure of test, assessment, survey, poll, and evaluation response data. Those shortcomings in our information infrastructure and communication systems are causing negative feedback loops of increasingly chaotic noise.

This is why it is so important that precision science is rooted in everyday language and thinking, per Nersessian’s (2002) treatment of Maxwell and Rasch’s (1960, pp. 110-115) adoption of Maxwell’s method of analogy (Fisher, 2010; Fisher & Stenner, 2013). The metric system (System International des Unites, or SI) is a natural language extension of intuitive and historical methods of bringing together words, concepts, and things, renamed instruments, theories, and data. A new SI for human, social, and natural capital built out into science and commerce will be one component of a multilevel and complex adaptive system that resolves today’s epistemic crisis by tapping deeper resources for the creation of meaning than are available in today’s institutions.

Everything is interrelated. The epistemic crisis will be resolved when our institutions base decisions not just on a potentially arbitrary collection of facts but on facts internally consistent enough to support instrument calibration and predictive theory. The facts have to be common sensical to everyday people, to employees, customers, teachers, students, patients, doctors, nurses, managers. People have to be able to see themselves and where they stand relative to their goals, their origins, and everyone else in the pictures drawn by the results of tests, surveys, and evaluations. That’s not possible in today’s systems. And in those systems, some people have systematically unfair advantages. That has to change, not through some kind of Brave New World hobbling of those with advantages but by leveling the playing field to allow everyone the same opportunities for self-improvement and the rewards that follow from it.

That’s it in a nutshell. Really good article:

America is facing an epistemic crisis – Vox

https://apple.news/A0alOElOQT5itYGPAJ3eYPQ

References

Fisher, W. P., Jr. (2010, June 13-16). Rasch, Maxwell’s method of analogy, and the Chicago tradition. In G. Cooper (Chair), Https://conference.cbs.dk/index.php/rasch/Rasch2010/paper/view/824. Probabilistic models for measurement in education, psychology, social science and health: Celebrating 50 years since the publication of Rasch’s Probabilistic Models, University of Copenhagen School of Business, FUHU Conference Centre, Copenhagen, Denmark.

Fisher, W. P., Jr. (2010). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics Conference Series, 238(1), http://iopscience.iop.org/1742-6596/238/1/012016/pdf/1742-6596_238_1_012016.pdf.

Fisher, W. P., Jr., & Stenner, A. J. (2013). On the potential for improved measurement in the human and social sciences. In Q. Zhang & H. Yang (Eds.), Pacific Rim Objective Measurement Symposium 2012 Conference Proceedings (pp. 1-11). Berlin, Germany: Springer-Verlag.

Nersessian, N. J. (2002). Maxwell and “the method of physical analogy”: Model-based reasoning, generic abstraction, and conceptual change. In D. Malament (Ed.), Reading natural philosophy: Essays in the history and philosophy of science and mathematics (pp. 129-166). Lasalle, Illinois: Open Court.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Excerpts and Notes from Goldberg’s “Billions of Drops…”

December 23, 2015

Goldberg, S. H. (2009). Billions of drops in millions of buckets: Why philanthropy doesn’t advance social progress. New York: Wiley.

p. 8:
Transaction costs: “…nonprofit financial markets are highly disorganized, with considerable duplication of effort, resource diversion, and processes that ‘take a fair amount of time to review grant applications and to make funding decisions’ [citing Harvard Business School Case No. 9-391-096, p. 7, Note on Starting a Nonprofit Venture, 11 Sept 1992]. It would be a major understatement to describe the resulting capital market as inefficient.”

A McKinsey study found that nonprofits spend 2.5 to 12 times more raising capital than for-profits do. When administrative costs are factored in, nonprofits spend 5.5 to 21.5 times more.

For-profit and nonprofit funding efforts contrasted on pages 8 and 9.

p. 10:
Balanced scorecard rating criteria

p. 11:
“Even at double-digit annual growth rates, it will take many years for social entrepreneurs and their funders to address even 10% of the populations in need.”

p. 12:
Exhibit 1.5 shows that the percentages of various needs served by leading social enterprises are barely drops in the respective buckets; they range from 0.07% to 3.30%.

pp. 14-16:
Nonprofit funding is not tied to performance. Even when a nonprofit makes the effort to show measured improvement in impact, it does little or nothing to change their funding picture. It appears that there is some kind of funding ceiling implicitly imposed by funders, since nonprofit growth and success seems to persuade capital sources that their work there is done. Mediocre and low performing nonprofits seem to be able to continue drawing funds indefinitely from sympathetic donors who don’t require evidence of effective use of their money.

p. 34:
“…meaningful reductions in poverty, illiteracy, violence, and hopelessness will require a fundamental restructuring of nonprofit capital markets. Such a restructuring would need to make it much easier for philanthropists of all stripes–large and small, public and private, institutional and individual–to fund nonprofit organizations that maximize social impact.”

p. 54:
Exhibit 2.3 is a chart showing that fewer people rose from poverty, and more remained in it or fell deeper into it, in the period of 1988-98 compared with 1969-1979.

pp. 70-71:
Kotter’s (1996) change cycle.

p. 75:
McKinsey’s seven elements of nonprofit capacity and capacity assessment grid.

pp. 94-95:
Exhibits 3.1 and 3.2 contrast the way financial markets reward for-profit performance with the way nonprofit markets reward fund raising efforts.

Financial markets
1. Market aggregates and disseminates standardized data
2. Analysts publish rigorous research reports
3. Investors proactively search for strong performers
4. Investors penalize weak performers
5. Market promotes performance
6. Strong performers grow

Nonprofit markets
1. Social performance is difficult to measure
2. NPOs don’t have resources or expertise to report results
3. Investors can’t get reliable or standardized results data
4. Strong and weak NPOs spend 40 to 60% of time fundraising
5. Market promotes fundraising
6. Investors can’t fund performance; NPOs can’t scale

p. 95:
“…nonprofits can’t possibly raise enough money to achieve transformative social impact within the constraints of the existing fundraising system. I submit that significant social progress cannot be achieved without what I’m going to call ‘third-stage funding,’ that is, funding that doesn’t suffer from disabling fragmentation. The existing nonprofit capital market is not capable of [p. 97] providing third-stage funding. Such funding can arise only when investors are sufficiently well informed to make big bets at understandable and manageable levels of risk. Existing nonprofit capital markets neither provide investors with the kinds of information needed–actionable information about nonprofit performance–nor provide the kinds of intermediation–active oversight by knowledgeable professionals–needed to mitigate risk. Absent third-stage funding, nonprofit capital will remain irreducibly fragmented, preventing the marshaling of resources that nonprofit organizations need to make meaningful and enduring progress against $100 million problems.”

pp. 99-114:
Text and diagrams on innovation, market adoption, transformative impact.

p. 140:
Exhibit 4.2: Capital distribution of nonprofits, highlighting mid-caps

pages 192-3 make the case for the difference between a regular market and the current state of philanthropic, social capital markets.

p. 192:
“So financial markets provide information investors can use to compare alternative investment opportunities based on their performance, and they provide a dynamic mechanism for moving money away from weak performers and toward strong performers. Just as water seeks its own level, markets continuously recalibrate prices until they achieve a roughly optimal equilibrium at which most companies receive the ‘right’ amount of investment. In this way, good companies thrive and bad ones improve or die.
“The social sector should work the same way. .. But philanthropic capital doesn’t flow toward effective nonprofits and away from ineffective nonprofits for a simple reason: contributors can’t tell the difference between the two. That is, philanthropists just don’t [p. 193] know what various nonprofits actually accomplish. Instead, they only know what nonprofits are trying to accomplish, and they only know that based on what the nonprofits themselves tell them.”

p. 193:
“The signs that the lack of social progress is linked to capital market dysfunctions are unmistakable: fundraising remains the number-one [p. 194] challenge of the sector despite the fact that nonprofit leaders divert some 40 to 60% of their time from productive work to chasing after money; donations raised are almost always too small, too short, and too restricted to enhance productive capacity; most mid-caps are ensnared in the ‘social entrepreneur’s trap’ of focusing on today and neglecting tomorrow; and so on. So any meaningful progress we could make in the direction of helping the nonprofit capital market allocate funds as effectively as the private capital market does could translate into tremendous advances in extending social and economic opportunity.
“Indeed, enhancing nonprofit capital allocation is likely to improve people’s lives much more than, say, further increasing the total amount of donations. Why? Because capital allocation has a multiplier effect.”

“If we want to materially improve the performance and increase the impact of the nonprofit sector, we need to understand what’s preventing [p. 195] it from doing a better job of allocating philanthropic capital. And figuring out why nonprofit capital markets don’t work very well requires us to understand why the financial markets do such a better job.”

p. 197:
“When all is said and done, securities prices are nothing more than convenient approximations that market participants accept as a way of simplifying their economic interactions, with a full understanding that market prices are useful even when they are way off the mark, as they so often are. In fact, that’s the whole point of markets: to aggregate the imperfect and incomplete knowledge held by vast numbers of traders about much various securities are worth and still make allocation choices that are better than we could without markets.
“Philanthropists face precisely the same problem: how to make better use of limited information to maximize output, in this case, social impact. Considering the dearth of useful tools available to donors today, the solution doesn’t have to be perfect or even all that good, at least at first. It just needs to improve the status quo and get better over time.
“Much of the solution, I believe, lies in finding useful adaptations of market mechanisms that will mitigate the effects of the same lack of reliable and comprehensive information about social sector performance. I would even go so far as to say that social enterprises can’t hope to realize their ‘one day, all children’ visions without a funding allociation system that acts more like a market.
“We can, and indeed do, make incremental improvements in nonprofit funding without market mechanisms. But without markets, I don’t see how we can fix the fragmentation problem or produce transformative social impact, such as ensuring that every child in America has a good education. The problems we face are too big and have too many moving parts to ignore the self-organizing dynamics of market economics. As Thomas Friedman said about the need to impose a carbon tax at a time of falling oil prices, ‘I’ve wracked my brain trying to think of ways to retool America around clean-power technologies without a price signal–i.e., a tax–and there are no effective ones.”

p. 199:
“Prices enable financial markets to work the way nonprofit capital markets should–by sending informative signals about the most effective organizations so that money will flow to them naturally..”

p. 200:
[Quotes Kurtzman citing De Soto on the mystery of capital. Also see p. 209, below.]
“‘Solve the mystery of capital and you solve many seemingly intractable problems along with it.'”
[That’s from page 69 in Kurtzman, 2002.]

p. 201:
[Goldberg says he’s quoting Daniel Yankelovich here, but the footnote does not appear to have anything to do with this quote:]
“‘The first step is to measure what can easily be measured. The second is to disregard what can’t be measured, or give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can’t be measured easily isn’t very important. This is blindness. The fourth step is to say that what can’t be easily measured really doesn’t exist. This is suicide.'”

Goldberg gives example here of $10,000 invested witha a 10% increase in value, compared with $10,000 put into a nonprofit. “But if the nonprofit makes good use of the money and, let’s say, brings the reading scores of 10 elementary school students up from below grade level to grade level, we can’t say how much my initial investment is ‘worth’ now. I could make the argument that the value has increased because the students have received a demonstrated educational benefit that is valuable to them. Since that’s the reason I made the donation, the achievement of higher scores must have value to me, as well.”

p. 202:
Goldberg wonders whether donations to nonprofits would be better conceived as purchases than investments.

p. 207:
Goldberg quotes Jon Gertner from the March 9, 2008, issue of the New York Times Magazine devoted to philanthropy:

“‘Why shouldn’t the world’s smartest capitalists be able to figure out more effective ways to give out money now? And why shouldn’t they want to make sure their philanthropy has significant social impact? If they can measure impact, couldn’t they get past the resistance that [Warren] Buffet highlighted and finally separate what works from what doesn’t?'”

p. 208:
“Once we abandon the false notions that financial markets are precision instruments for measuring unambiguous phenomena, and that the business and nonproft sectors are based in mutually exclusive principles of value, we can deconstruct the true nature of the problems we need to address and adapt market-like mechanisms that are suited to the particulars of the social sector.
“All of this is a long way (okay, a very long way) of saying that even ordinal rankings of nonprofit investments can have tremendous value in choosing among competing donation opportunities, especially when the choices are so numerous and varied. If I’m a social investor, I’d really like to know which nonprofits are likely to produce ‘more’ impact and which ones are likely to produce ‘less.'”

“It isn’t necessary to replicate the complex working of the modern stock markets to fashion an intelligent and useful nonprofit capital allocation mechanism. All we’re looking for is some kind of functional indication that would (1) isolate promising nonprofit investments from among the confusing swarm of too many seemingly worthy social-purpose organizations and (2) roughly differentiate among them based on the likelihood of ‘more’ or ‘less’ impact. This is what I meant earlier by increasing [p. 209] signals and decreasing noise.”

p. 209:
Goldberg apparently didn’t read De Soto, as he says that the mystery of capital is posed by Kurtzman and says it is solved via the collective intelligence and wisdom of crowds. This completely misses the point of the crucial value that transparent representations of structural invariance hold in market functionality. Goldberg is apparently offering a loose kind of market for which there is an aggregate index of stocks for nonprofits that are built up from their various ordinal performance measures. I think I find a better way in my work, building more closely from De Soto (Fisher, 2002, 2003, 2005, 2007, 2009a, 2009b).

p. 231:
Goldberg quotes Harvard’s Allen Grossman (1999) on the cost-benefit boundaries of more effective nonprofit capital allocation:

“‘Is there a significant downside risk in restructuring some portion of the philanthropic capital markets to test the effectiveness of performance driven philanthropy? The short answer is, ‘No.’ The current reality is that most broad-based solutions to social problems have eluded the conventional and fragmented approaches to philanthropy. It is hard to imagine that experiments to change the system to a more performance driven and rational market would negatively impact the effectiveness of the current funding flows–and could have dramatic upside potential.'”

p. 232:
Quotes Douglas Hubbard’s How to Measure Anything book that Stenner endorsed, and Linacre and I didn’t.

p. 233:
Cites Stevens on the four levels of measurement and uses it to justify his position concerning ordinal rankings, recognizing that “we can’t add or subtract ordinals.”

pp. 233-5:
Justifies ordinal measures via example of Google’s PageRank algorithm. [I could connect from here using Mary Garner’s (2009) comparison of PageRank with Rasch.]

p. 236:
Goldberg tries to justify the use of ordinal measures by citing their widespread use in social science and health care. He conveniently ignores the fact that virtually all of the same problems and criticisms that apply to philanthropic capital markets also apply in these areas. In not grasping the fundamental value of De Soto’s concept of transferable and transparent representations, and in knowing nothing of Rasch measurement, he was unable to properly evaluate to potential of ordinal data’s role in the formation of philanthropic capital markets. Ordinal measures aren’t just not good enough, they represent a dangerous diversion of resources that will be put into systems that take on lives of their own, creating a new layer of dysfunctional relationships that will be hard to overcome.

p. 261 [Goldberg shows here his complete ignorance about measurement. He is apparently totally unaware of the work that is in fact most relevant to his cause, going back to Thurstone in 1920s, Rasch in the 1950s-1970s, and Wright in the 1960s to 2000. Both of the problems he identifies have long since been solved in theory and in practice in a wide range of domains in education, psychology, health care, etc.]:
“Having first studied performance evaluation some 30 years ago, I feel confident in saying that all the foundational work has been done. There won’t be a ‘eureka!’ breakthrough where someone finally figures out the one true way to guage nonprofit effectiveness.
“Indeed, I would venture to say that we know virtually everything there is to know about measuring the performance of nonprofit organizations with only two exceptions: (1) How can we compare nonprofits with different missions or approaches, and (2) how can we make actionable performance assessments common practice for growth-ready mid-caps and readily available to all prospective donors?”

p. 263:
“Why would a social entrepreneur divert limited resources to impact assessment if there were no prospects it would increase funding? How could an investor who wanted to maximize the impact of her giving possibly put more golden eggs in fewer impact-producing baskets if she had no way to distinguish one basket from another? The result: there’s no performance data to attract growth capital, and there’s no growth capital to induce performance measurement. Until we fix that Catch-22, performance evaluation will not become an integral part of social enterprise.”

pp. 264-5:
Long quotation from Ken Berger at Charity Navigator on their ongoing efforts at developing an outcome measurement system. [wpf, 8 Nov 2009: I read the passage quoted by Goldberg in Berger’s blog when it came out and have been watching and waiting ever since for the new system. wpf, 8 Feb 2012: The new system has been online for some time but still does not include anything on impacts or outcomes. It has expanded from a sole focus on financials to also include accountability and transparency. But it does not yet address Goldberg’s concerns as there still is no way to tell what works from what doesn’t.]

p. 265:
“The failure of the social sector to coordinate independent assets and create a whole that exceeds the sum of its parts results from an absence of.. platform leadership’: ‘the ability of a company to drive innovation around a particular platform technology at the broad industry level.’ The object is to multiply value by working together: ‘the more people who use the platform products, the more incentives there are for complement producers to introduce more complementary products, causing a virtuous cycle.'” [Quotes here from Cusumano & Gawer (2002). The concept of platform leadership speaks directly to the system of issues raised by Miller & O’Leary (2007) that must be addressed to form effective HSN capital markets.]

p. 266:
“…the nonprofit sector has a great deal of both money and innovation, but too little available information about too many organizations. The result is capital fragmentation that squelches growth. None of the stakeholders has enough horsepower on its own to impose order on this chaos, but some kind of realignment could release all of that pent-up potential energy. While command-and-control authority is neither feasible nor desirable, the conditions are ripe for platform leadership.”

“It is doubtful that the IMPEX could amass all of the resources internally needed to build and grow a virtual nonprofit stock market that could connect large numbers of growth-capital investors with large numbers of [p. 267] growth-ready mid-caps. But it might be able to convene a powerful coalition of complementary actors that could achieve a critical mass of support for performance-based philanthropy. The challenge would be to develop an organization focused on filling the gaps rather than encroaching on the turf of established firms whose participation and innovation would be required to build a platform for nurturing growth of social enterprise..”

p. 268-9:
Intermediated nonprofit capital market shifts fundraising burden from grantees to intermediaries.

p. 271:
“The surging growth of national donor-advised funds, which simplify and reduce the transaction costs of methodical giving, exemplifies the kind of financial innovation that is poised to leverage market-based investment guidance.” [President of Schwab Charitable quoted as wanting to make charitable giving information- and results-driven.]

p. 272:
Rating agencies and organizations: Charity Navigator, Guidestar, Wise Giving Alliance.
Online donor rankings: GlobalGiving, GreatNonprofits, SocialMarkets
Evaluation consultants: Mathematica

Google’s mission statement: “to organize the world’s information and make it universally accessible and useful.”

p. 273:
Exhibit 9.4 Impact Index Whole Product
Image of stakeholders circling IMPEX:
Trading engine
Listed nonprofits
Data producers and aggregators
Trading community
Researchers and analysts
Investors and advisors
Government and business supporters

p. 275:
“That’s the starting point for replication [of social innovations that work]: finding and funding; matching money with performance.”

[WPF bottom line: Because Goldberg misses De Soto’s point about transparent representations resolving the mystery of capital, he is unable to see his way toward making the nonprofit capital markets function more like financial capital markets, with the difference being the focus on the growth of human, social, and natural capital. Though Goldberg intuits good points about the wisdom of crowds, he doesn’t know enough about the flaws of ordinal measurement relative to interval measurement, or about the relatively easy access to interval measures that can be had, to do the job.]

References

Cusumano, M. A., & Gawer, A. (2002, Spring). The elements of platform leadership. MIT Sloan Management Review, 43(3), 58.

De Soto, H. (2000). The mystery of capital: Why capitalism triumphs in the West and fails everywhere else. New York: Basic Books.

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2003). Measurement and communities of inquiry. Rasch Measurement Transactions, 17(3), 936-8 [http://www.rasch.org/rmt/rmt173.pdf].

Fisher, W. P., Jr. (2005). Daredevil barnstorming to the tipping point: New aspirations for the human sciences. Journal of Applied Measurement, 6(3), 173-9 [http://www.livingcapitalmetrics.com/images/FisherJAM05.pdf].

Fisher, W. P., Jr. (2007, Summer). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-3 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2009a). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In M. Wilson, K. Draney, N. Brown & B. Duckor (Eds.), Advances in Rasch Measurement, Vol. Two (p. in press [http://www.livingcapitalmetrics.com/images/BringingHSN_FisherARMII.pdf]). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2009b, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Garner, M. (2009, Autumn). Google’s PageRank algorithm and the Rasch measurement model. Rasch Measurement Transactions, 23(2), 1201-2 [http://www.rasch.org/rmt/rmt232.pdf].

Grossman, A. (1999). Philanthropic social capital markets: Performance driven philanthropy (Social Enterprise Series 12 No. 00-002). Harvard Business School Working Paper.

Kotter, J. (1996). Leading change. Cambridge, Massachusetts: Harvard Business School Press.

Kurtzman, J. (2002). How the markets really work. New York: Crown Business.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-34.

Professional capital as product of human, social, and decisional capitals

April 18, 2014

Leslie Pendrill gave me a tip on a very interesting book, Professional Capital, by Michael Fullan. The author’s distinction between business capital and professional capital is somewhat akin to my distinction (Fisher, 2011) between dead and living capital. The primary point of contact between Fullan’s sense of capital and mine stems from his inclusion of social and decisional capital as crucial enhancements of human capital.

Of course, defining human capital as talent, as Fullan does, is not going to go very far toward supporting generalized management of it. Efficient markets require that capital be represented in transparent and universally available instruments (common currencies or metrics). Transparent, systematic representation makes it possible to act on capital abstractly, in laboratories, courts, and banks, without having to do anything at all with the physical resource itself. (Contrast this with socialism’s focus on controlling the actual concrete resources, and the resulting empty store shelves, unfulfilled five-year plans, pogroms and purges, and overall failure.) Universally accessible transparent representations make capital additive (amounts can be accrued), divisible (it can be divided into shares), and mobile (it can be moved around in networks accepting the currency/metric). (See references below for more information.)

Fullan cites research by Carrie Leanna at the U of Pittsburgh showing that teachers with high social capital increased their students math scores by 5.7% more than teachers with low social capital. The teachers with the highest skill levels (most human capital) and high social capital did the overall best. Low-ability teachers in schools with high social capital did as well as average teachers.

This is great, but the real cream of Fullan’s argument concerns the importance of what he calls decisional capital. I don’t think this will likely work out to be entirely separate from human capital, but his point is well taken: the capacity to consistently engage with students with competence, good judgment, insight, inspiration, creative improvisation, and openness to feedback in a context of shared responsibility is vital. All of this is quite consistent with recent work on collective intelligence (Fischer, Giaccardi, Eden, et al., 2005; Hutchins, 2010; Magnus, 2007; Nersessian, 2006; Woolley, Chabris, Pentland, et al., 2010; Woolley and Fuchs, 2011).

And, of course, you can see this coming: decisional capital is precisely what better measurement provides. Integrated formative and summative assessment informs decision making at the individual level in ways that are otherwise impossible. When those assessments are expressed in uniformly interpretable and applicable units of measurement, collective intelligence and social capital are boosted in the ways documented by Leanna as enhancing teacher performance and boosting student outcomes.

Anyway, just wanted to share that. It fits right in with the trading zone concept I presented at IOMW (the slides are available on my LinkedIn page).

Fischer, G., Giaccardi, E., Eden, H., Sugimoto, M., & Ye, Y. (2005). Beyond binary choices: Integrating individual and social creativity. International Journal of Human-Computer Studies, 63, 482-512.

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2003). Measurement and communities of inquiry. Rasch Measurement Transactions, 17(3), 936-938 [http://www.rasch.org/rmt/rmt173.pdf].

Fisher, W. P., Jr. (2004a, Thursday, January 22). Bringing capital to life via measurement: A contribution to the new economics. In R. Smith (Chair), Session 3.3B. Rasch Models in Economics and Marketing. Second International Conference on Measurement. Perth, Western Australia:  Murdoch University.

Fisher, W. P., Jr. (2004b, Friday, July 2). Relational networks and trust in the measurement of social capital. Twelfth International Objective Measurement Workshops. Cairns, Queensland, Australia: James Cook University.

Fisher, W. P., Jr. (2005a). Daredevil barnstorming to the tipping point: New aspirations for the human sciences. Journal of Applied Measurement, 6(3), 173-179.

Fisher, W. P., Jr. (2005b, August 1-3). Data standards for living human, social, and natural capital. In Session G: Concluding Discussion, Future Plans, Policy, etc. Conference on Entrepreneurship and Human Rights. Pope Auditorium, Lowenstein Bldg, Fordham University.

Fisher, W. P., Jr. (2007, Summer). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-1093 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2008a, 3-5 September). New metrological horizons: Invariant reference standards for instruments measuring human, social, and natural capital. 12th IMEKO TC1-TC7 Joint Symposium on Man, Science, and Measurement. Annecy, France: University of Savoie.

Fisher, W. P., Jr. (2008b, March 28). Rasch, Frisch, two Fishers and the prehistory of the Separability Theorem. In J. William P. Fisher (Ed.), Session 67.056. Reading Rasch Closely: The History and Future of Measurement. American Educational Research Association. New York City [Paper available at SSRN: http://ssrn.com/abstract=1698919%5D: Rasch Measurement SIG.

Fisher, W. P., Jr. (2009a, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P., Jr. (2009b). NIST Critical national need idea White Paper: Metrological infrastructure for human, social, and natural capital (http://www.nist.gov/tip/wp/pswp/upload/202_metrological_infrastructure_for_human_social_natural.pdf). Washington, DC: National Institute for Standards and Technology (11 pages).

Fisher, W. P., Jr. (2010a, 22 November). Meaningfulness, measurement, value seeking, and the corporate objective function: An introduction to new possibilities. Sausalito, California: LivingCapitalMetrics.com (http://ssrn.com/abstract=1713467).

Fisher, W. P. J. (2010b). Measurement, reduced transaction costs, and the ethics of efficient markets for human, social, and natural capital (p. http://ssrn.com/abstract=2340674). Bridge to Business Postdoctoral Certification, Freeman School of Business: Tulane University.

Fisher, W. P., Jr. (2010c). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics: Conference Series, 238(1), http://iopscience.iop.org/1742-6596/238/1/012016/pdf/1742-6596_238_1_012016.pdf.

Fisher, W. P., Jr. (2011a). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In N. Brown, B. Duckor, K. Draney & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 2 (pp. 1-27). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2011b). Measuring genuine progress by scaling economic indicators to think global & act local: An example from the UN Millennium Development Goals project. LivingCapitalMetrics.com [Online]. Available: http://ssrn.com/abstract=1739386 (Accessed 18 January 2011).

Fisher, W. P., Jr. (2012). Measure and manage: Intangible assets metric standards for sustainability. In J. Marques, S. Dhiman & S. Holt (Eds.), Business administration education: Changes in management and leadership strategies (pp. 43-63). New York: Palgrave Macmillan.

Fisher, W. P., Jr., & Stenner, A. J. (2005, Tuesday, April 12). Creating a common market for the liberation of literacy capital. In R. E. Schumacker (Ed.), Rasch Measurement: Philosophical, Biological and Attitudinal Impacts. American Educational Research Association. Montreal, Canada: Rasch Measurement SIG.

Fisher, W. P., Jr., & Stenner, A. J. (2011a, January). Metrology for the social, behavioral, and economic sciences. Available: http://www.nsf.gov/sbe/sbe_2020/submission_detail.cfm?upld_id=36 (Accessed 12 January 2014).

Fisher, W. P., Jr., & Stenner, A. J. (2011b, August 31 to September 2). A technology roadmap for intangible assets metrology. In Fundamentals of measurement science. International Measurement Confederation (IMEKO) TC1-TC7-TC13 Joint Symposium. Jena, Germany: http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24493/ilm1-2011imeko-018.pdf.

Hutchins, E. (2010). Cognitive ecology. Topics in Cognitive Science, 2, 705-715.

Magnus, P. D. (2007). Distributed cognition and the task of science. Social Studies of Science, 37(2), 297-310.

Nersessian, N. J. (2006, December). Model-based reasoning in distributed cognitive systems. Philosophy of Science, pp. 699-709.

Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010, 29 October). Evidence for a collective intelligence factor in the performance of human groups. Science, pp. 686-688.

Woolley, A. W., & Fuchs, E. (2011, September-October). Collective intelligence in the organization of science. Organization Science, pp. 1359-1367.

Six Classes of Results Supporting the Measurability of Human Functioning and Capability

April 12, 2014

Another example of high-level analysis that suffers from a lack of input from state of the art measurement arises in Nussbaum (1997, p. 1205), where the author remarks that it is now a matter of course, in development economics, “to recognize distinct domains of human functioning and capability that are not commensurable along a single metric, and with regard to which choice and liberty of agency play a fundamental structuring role.” Though Nussbaum (2011, pp. 58-62) has lately given a more nuanced account of the challenges of measurement relative to human capabilities, appreciation of the power and flexibility of contemporary measurement models, methods, and instruments remains lacking. For a detailed example of the complexities and challenges that must be addressed in the context of global human development, which is Nussbaum’s area of interest, see Fisher (2011).

Though there are indeed domains of human functioning and capability that are not commensurable along a single metric, they are not the ones referred to by Nussbaum or the texts she cites. On the contrary, six different approaches to establishing the measurability of human functioning and capability have been explored and proven as providing, especially in their composite aggregate, a substantial basis for theory and practice (modified from Fisher, 2009, pp. 1279-1281). These six classes of results speak to the abstract, mathematical side of the paradox noted by Ricoeur (see previous post here) concerning the need to simultaneously accept roles for abstract ideal global universals and concrete local historical contexts in strategic planning and thinking. The six classes of results are:

  1. Mathematical proofs of the necessity and sufficiency of test and survey scores for invariant measurement in the context of Rasch’s probabilistic models (Andersen, 1977, 1999; Fischer, 1981; Newby, Conner, Grant, and Bunderson, 2009; van der Linden, 1992).
  2. Reproduction of physical units of measurement (centimeters, grams, etc.) from ordinal observations (Choi, 1997; Moulton, 1993; Pelton and Bunderson, 2003; Stephanou and Fisher, 2013).
  3. The common mathematical form of the laws of nature and Rasch models (Rasch, 1960, pp. 110-115; Fisher, 2010; Fisher and Stenner, 2013).
  4. Multiple independent studies of the same constructs on different (and common) samples using different (and the same) instruments intended to measure the same thing converge on common units, defining the same objects, substantiating theory, and supporting the viability of standardized metrics (Fisher, 1997a, 1997b, 1999, etc.).
  5. Thousands of peer-reviewed publications in hundreds of scientific journals provide a wide-ranging and diverse array of supporting evidence and theory.
  6. Analogous causal attributions and theoretical explanatory power can be created in both natural and social science contexts (Stenner, Fisher, Stone, and Burdick, 2013).

What we have here, in sum, is a combination of Greek axiomatic and Babylonian empirical algorithms, in accord with Toulmin’s (1961, pp. 28-33) sense of the contrasting principled bases for scientific advancement. Feynman (1965, p. 46) called for less of a focus on the Greek chain of reasoning approach, as it is only as strong as its weakest link, whereas the Babylonian algorithms are akin to a platform with enough supporting legs that one or more might fail without compromising its overall stability. The variations in theory and evidence under these six headings provide ample support for the conceptual and practical viability of metrological systems of measurement in education, health care, human resource management, sociology, natural resource management, social services, and many other fields. The philosophical critique of any type of economics will inevitably be wide of the mark if uninformed about these accomplishments in the theory and practice of measurement.

References

Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42(1), 69-81.

Andersen, E. B. (1999). Sufficient statistics in educational measurement. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 122-125). New York: Pergamon.

Choi, S. E. (1997). Rasch invents “ounces.” Rasch Measurement Transactions, 11(2), 557 [http://www.rasch.org/rmt/rmt112.htm#Ounces].

Feynman, R. (1965). The character of physical law. Cambridge, Massachusetts: MIT Press.

Fischer, G. H. (1981). On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika, 46(1), 59-77.

Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Jr. (1997). What scale-free measurement means to health outcomes research. Physical Medicine & Rehabilitation State of the Art Reviews, 11(2), 357-373.

Fisher, W. P., Jr. (1999). Foundations for health status metrology: The stability of MOS SF-36 PF-10 calibrations across samples. Journal of the Louisiana State Medical Society, 151(11), 566-578.

Fisher, W. P., Jr. (2009). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P., Jr. (2010). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics: Conference Series, 238(1), http://iopscience.iop.org/1742-6596/238/1/012016/pdf/1742-6596_238_1_012016.pdf.

Fisher, W. P., Jr. (2011). Measuring genuine progress by scaling economic indicators to think global & act local: An example from the UN Millennium Development Goals project. LivingCapitalMetrics.com. Retrieved 18 January 2011, from Social Science Research Network: http://ssrn.com/abstract=1739386.

Fisher, W. P., Jr., & Stenner, A. J. (2013). On the potential for improved measurement in the human and social sciences. In Q. Zhang & H. Yang (Eds.), Pacific Rim Objective Measurement Symposium 2012 Conference Proceedings (pp. 1-11). Berlin, Germany: Springer-Verlag.

Moulton, M. (1993). Probabilistic mapping. Rasch Measurement Transactions, 7(1), 268 [http://www.rasch.org/rmt/rmt71b.htm].

Newby, V. A., Conner, G. R., Grant, C. P., & Bunderson, C. V. (2009). The Rasch model and additive conjoint measurement. Journal of Applied Measurement, 107(4), 348-354.

Nussbaum, M. (1997). Flawed foundations: The philosophical critique of (a particular type of) economics. University of Chicago Law Review, 64, 1197-1214.

Nussbaum, M. (2011). Creating capabilities: The human development approach. Cambridge, MA: The Belknap Press.

Pelton, T., & Bunderson, V. (2003). The recovery of the density scale using a stochastic quasi-realization of additive conjoint measurement. Journal of Applied Measurement, 4(3), 269-281.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14, 58-94.

Stenner, A. J., Fisher, W. P., Jr., Stone, M. H., & Burdick, D. S. (2013). Causal Rasch models. Frontiers in Psychology: Quantitative Psychology and Measurement, 4(536), 1-14.

Stephanou, A., & Fisher, W. P., Jr. (2013). From concrete to abstract in the measurement of length. Journal of Physics Conference Series, 459, http://iopscience.iop.org/1742-6596/459/1/012026.

Toulmin, S. E. (1961). Foresight and understanding: An enquiry into the aims of science. London, England: Hutchinson.

van der Linden, W. J. (1992). Sufficient and necessary statistics. Rasch Measurement Transactions, 6(3), 231 [http://www.rasch.org/rmt/rmt63d.htm].

 

Convergence, Divergence, and the Continuum of Field-Organizing Activities

March 29, 2014

So what are the possibilities for growing out green shoots from the seeds and roots of an ethical orientation to keeping the dialogue going? What kinds of fruits might be expected from cultivating a common ground for choosing discourse over violence? What are the consequences for practice of planting this seed in this ground?

The same participant in the conversation earlier this week at Convergence XV who spoke of the peace building processes taking place around the world also described a developmental context for these issues of mutual understanding. The work of Theo Dawson and her colleagues (Dawson, 2002a, 2002b, 2004; Dawson, Fischer, and Stein, 2006) is especially pertinent here. Their comparisons of multiple approaches to cognitive and moral development have provided clear and decisive theory, evidence, and instrumentation concerning the conceptual integrations that take place in the evolution of hierarchical complexity.

Conceptual integrations occur when previously tacit, unexamined, and assumed principles informing a sphere of operations are brought into conscious awareness and are transformed into explicit objects of new operations. Developmentally, this is the process of discovery that takes place from the earliest stages of life, in utero. Organisms of all kinds mature in a process of interaction with their environments. Young children at the “terrible two” stage, for instance, are realizing that anything they can detach from, whether by throwing or by denying (“No!”), is not part of them. Only a few months earlier, the same children will have been fascinated with their fingers and toes, realizing these are parts of their own bodies, often by putting them in their mouths.

There are as many opportunities for conceptual integrations between the ages of 21 to 99 as there are between birth and 21. Developmental differences in perspectives can make for riotously comic situations, and can also lead to conflicts, even when the participants agree on more than they disagree on. And so here we arrive at a position from which we can get a grip on how to integrate convergence and divergence in a common framework that follows from the prior post’s brief description of the ontological method’s three moments of reduction, application, and deconstruction.

Image

Woolley and colleagues (Woolley, et al., 2010; Woolley and Fuchs, 2011) describe a continuum of five field-organizing activities categorizing the types of information needed for effective collective intelligence (Figure 1). Four of these five activities (defining, bounding, opening, and bridging) vary in the convergent versus divergent processes they bring to bear in collective thinking. Defining and bounding are convergent processes that inform judgment and decision making. These activities are especially important in the emergence of a new field or organization, when the object of interest and the methods of recognizing and producing it are in contention. Opening and bridging activities, in contrast, diverge from accepted definitions and transgress boundaries in the creative process of pushing into new areas. Undergirding the continuum as a whole is the fifth activity, grounding, which serves as a theory- and evidence-informed connection to meaningful and useful results.

There are instances in which defining and bounding activities have progressed to the point that the explanatory power of theory enables the calibration of test items from knowledge of the component parts included in those items. The efficiencies and cost reductions gained from computer-based item generation and administration are significant. Research in this area takes a variety of approaches; for more information, see Daniel and Embretson (2010), DeBoeck and Wilson (2004), Stenner, et al. (2013), and others.

The value of clear definitions and boundaries in this context stems in large part from the capacity to identify exceptions that prove (test) the rules, and that then also provide opportunities for opening and bridging. Kuhn (1961, p. 180; 1977, p. 205) noted that

To the extent that measurement and quantitative technique play an especially significant role in scientific discovery, they do so precisely because, by displaying significant anomaly, they tell scientists when and where to look for a new qualitative phenomenon.

Rasch (1960, p. 124) similarly understood that “Once a law has been established within a certain field then the law itself may serve as a tool for deciding whether or not added stimuli and/or objects belong to the original group.” Rasch gives the example of mechanical force applied to various masses with resulting accelerations, introducing idea that one of the instruments might exert magnetic as well as mechanical force, with noticeable effects on steel masses, but not on wooden masses. Rasch suggests that exploration of these anomalies may result in the discovery of other similar instruments that vary in the extent to which they also exert the new force, with the possible consequence of discovering a law of magnetic attraction.

There has been an intense interest in the assessment of divergent inconsistencies in measurement research and practice following in the wake of Rasch’s early work in psychological and social measurement (examples from a very large literature in this area include Karabatsos and Ulrich, 2002, and Smith and Plackner, 2009). Andrich, for instance, makes explicit reference to Kuhn (1961), saying, “…the function of a model for measurement…is to disclose anomalies, not merely to describe data” (Andrich, 2002, p. 352; also see Andrich, 1996, 2004, 2011). Typical software for applying Rasch models (Andrich, et al., 2013; Linacre, 2011, 2013; Wu, et al., 2007) thus accordingly provides many more qualitative numbers evaluating potential anomalies than quantitative measuring numbers. These qualitative numbers (digits that do not stand for something substantive that adds up in a constant unit) include uncertainty and confidence indicators that vary with sample size; mean square and standardized model fit statistics; and principal components analysis factor loadings and eigenvalues.

The opportunities for divergent openings onto new qualitative phenomena provided by data consistency evaluations are complemented in Rasch measurement by a variety of bridging activities. Different instruments intended to measure the same or closely related constructs may often be equated or co-calibrated, so they measure in a common unit (among many publications in this area, see Dawson, 2002a, 2004; Fisher, 1997; Fisher, et al., 1995; Massof and Ahmadian, 2007; Smith and Taylor, 2004). Similarly, the same instrument calibrated on different samples from the same population may exhibit consistent properties across those samples, offering further evidence of a potential for defining a common unit (Fisher, 1999).

Other opening and bridging activities include capacities (a) to drop items or questions from a test or survey, or to add them; (b) to adaptively administer subsets of custom-selected items from a large bank; and (c) to adjust measures for the leniency or severity of judges assigning ratings, all of which can be done, within the limits of the relevant definitions and boundaries, without compromising the unit of comparison. For methodological overviews, see Bond and Fox (2007), Wilson (2005), and others.

The various field-organizing activities spanning the range from convergence to divergence are implicated not only in research on collective thinking, but also in the history and philosophy of science. Galison and colleagues (Galison, 1997, 1999; Galison and Stump, 1996) closely examine positivist and antipositivist perspectives on the unity of science, finding their conclusions inconsistent with the evidence of history. A postpositivist perspective (Galison, 1999, p. 138), in contrast, finds “distinct communities and incommensurable beliefs” between and often within the areas of theory, experiment, and instrument-making. But instead of finding these communities “utterly condemned to passing one another without any possibility of significant interaction,” Galison (1999, p. 138) observes that “two groups can agree on rules of exchange even if they ascribe utterly different significance to the objects being exchanged; they may even disagree on the meaning of the exchange process itself.” In practice, “trading partners can hammer out a local coordination despite vast global differences.”

In accord with Woolley and colleagues’ work on convergent and divergent field-organizing activities, Galison (1999, p. 137) concludes, then, that “science is disunified, and—against our first intuitions—it is precisely the disunification of science that underpins its strength and stability.” Galison (1997, pp. 843-844) concludes with a section entitled “Cables, Bricks, and Metaphysics” in which the postpositivist disunity of science is seen to provide its unexpected coherence from the simultaneously convergent and divergent ways theories, experiments, and instruments interact.

But as Galison recognizes, a metaphor based on the intertwined strands in a cable is too mechanical to support the dynamic processes by which order arises from particular kinds of noise and chaos. Not cited by Galison is a burgeoning literature on the phenomenon of noise-induced order termed stochastic resonance (Andò  and Graziani 2000, Benzi, et al., 1981; Dykman and McClintock, 1998; Fisher, 1992, 2011; Hess and Albano, 1998; Repperger and Farris, 2010). Where the metaphor of a cable’s strands breaks down, stochastic resonance provides multiple ways of illustrating how the disorder of finite and partially independent processes can give rise to an otherwise inaccessible order and structure.

Stochastic resonance involves small noisy signals that can be amplified to have very large effects. The noise has to be of a particular kind, and too much of it will drown out rather than amplify the effect. Examples include the interaction of neuronal ensembles in the brain (Chialvo, Lontin, and Müller-Gerking, 1996), speech recognition (Moskowitz and Dickinson, 2002), and perceptual interpretation (Rianni and Simonotto, 1994). Given that Rasch’s models for measurement are stochastic versions of Guttman’s deterministic models (Andrich, 1985), the question has been raised as to how Rasch’s seemingly weaker assumptions could lead to a measurement model that is stronger than Guttman’s (Duncan, 1984, p. 220). Stochastic resonance may provide an essential clue to this puzzle (Fisher, 1992, 2011).

Another description of what might be a manifestation of stochastic resonance akin to that brought up by Galison arises in Berg and Timmermans’ (2000, p. 56) study of the constitution of universalities in a medical network. They note that, “Paradoxically, then, the increased stability and reach of this network was not due to more (precise) instructions: the protocol’s logistics could thrive only by parasitically drawing upon its own disorder.” Much the same has been said about the behaviors of markets (Mandelbrot, 2004), bringing us back to the topic of the day at Convergence XV earlier this week. I’ll have more to say on this issue of universalities constituted via noise-induced order in due course.

References

Andò, B., & Graziani, S. (2000). Stochastic resonance theory and applications. New York: Kluwer Academic Publishers.

Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. B. Tuma (Ed.), Sociological methodology 1985 (pp. 33-80). San Francisco, California: Jossey-Bass.

Andrich, D. (1996). Measurement criteria for choosing among models with graded responses. In A. von Eye & C. Clogg (Eds.), Categorical variables in developmental research: Methods of analysis (pp. 3-35). New York: Academic Press, Inc.

Andrich, D. (2002). Understanding resistance to the data-model relationship in Rasch’s paradigm: A reflection for the next generation. Journal of Applied Measurement, 3(3), 325-359.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Andrich, D. (2011). Rating scales and Rasch measurement. Expert Reviews in Pharmacoeconomics Outcome Research, 11(5), 571-585.

Andrich, D., Lyne, A., Sheridan, B., & Luo, G. (2013). RUMM 2030: Rasch unidimensional models for measurement. Perth, Australia: RUMM Laboratory Pty Ltd [www.rummlab.com.au].

Benzi, R., Sutera, A., & Vulpiani, A. (1981). The mechanism of stochastic resonance. Journal of Physics. A. Mathematical and General, 14, L453-L457.

Berg, M., & Timmermans, S. (2000). Order and their others: On the constitution of universalities in medical work. Configurations, 8(1), 31-61.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Chialvo, D., Longtin, A., & Müller-Gerking, J. (1996). Stochastic resonance in models of neuronal ensembles revisited [Electronic version].

Daniel, R. C., & Embretson, S. E. (2010). Designing cognitive complexity in mathematical problem-solving items. Applied Psychological Measurement, 34(5), 348-364.

Dawson, T. L. (2002a, Summer). A comparison of three developmental stage scoring systems. Journal of Applied Measurement, 3(2), 146-89.

Dawson, T. L. (2002b, March). New tools, new insights: Kohlberg’s moral reasoning stages revisited. International Journal of Behavioral Development, 26(2), 154-66.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

Dawson, T. L., Fischer, K. W., & Stein, Z. (2006). Reconsidering qualitative and quantitative research approaches: A cognitive developmental perspective. New Ideas in Psychology, 24, 229-239.

De Boeck, P., & Wilson, M. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. Statistics for Social and Behavioral Sciences). New York: Springer-Verlag.

Duncan, O. D. (1984). Notes on social measurement: Historical and critical. New York: Russell Sage Foundation.

Dykman, M. I., & McClintock, P. V. E. (1998, January 22). What can stochastic resonance do? Nature, 391(6665), 344.

Fisher, W. P., Jr. (1992, Spring). Stochastic resonance and Rasch measurement. Rasch Measurement Transactions, 5(4), 186-187 [http://www.rasch.org/rmt/rmt54k.htm].

Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Jr. (1999). Foundations for health status metrology: The stability of MOS SF-36 PF-10 calibrations across samples. Journal of the Louisiana State Medical Society, 151(11), 566-578.

Fisher, W. P., Jr. (2011). Stochastic and historical resonances of the unit in physics and psychometrics. Measurement: Interdisciplinary Research & Perspectives, 9, 46-50.

Fisher, W. P., Jr., Harvey, R. F., Taylor, P., Kilgore, K. M., & Kelly, C. K. (1995, February). Rehabits: A common language of functional assessment. Archives of Physical Medicine and Rehabilitation, 76(2), 113-122.

Galison, P. (1997). Image and logic: A material culture of microphysics. Chicago: University of Chicago Press.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York: Routledge.

Galison, P., & Stump, D. J. (1996). The disunity of science: Boundaries, contexts, and power. Palo Alto, California: Stanford University Press.

Hess, S. M., & Albano, A. M. (1998, February). Minimum requirements for stochastic resonance in threshold systems. International Journal of Bifurcation and Chaos, 8(2), 395-400.

Karabatsos, G., & Ullrich, J. R. (2002). Enumerating and testing conjoint measurement models. Mathematical Social Sciences, 43, 487-505.

Kuhn, T. S. (1961). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. in T. S. Kuhn, (Ed.). (1977). The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago: University of Chicago Press.)

Linacre, J. M. (2011). A user’s guide to WINSTEPS Rasch-Model computer program, v. 3.72.0. Chicago, Illinois: Winsteps.com.

Linacre, J. M. (2013). A user’s guide to FACETS Rasch-Model computer program, v. 3.71.0. Chicago, Illinois: Winsteps.com.

Mandelbrot, B. (2004). The misbehavior of markets. New York: Basic Books.

Massof, R. W., & Ahmadian, L. (2007, July). What do different visual function questionnaires measure? Ophthalmic Epidemiology, 14(4), 198-204.

Moskowitz, M. T., & Dickinson, B. W. (2002). Stochastic resonance in speech recognition: Differentiating between /b/ and /v/. Proceedings of the IEEE International Symposium on Circuits and Systems, 3, 855-858.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Repperger, D. W., & Farris, K. A. (2010, July). Stochastic resonance –a nonlinear control theory interpretation. International Journal of Systems Science, 41(7), 897-907.

Riani, M., & Simonotto, E. (1994). Stochastic resonance in the perceptual interpretation of ambiguous figures: A neural network model. Physical Review Letters, 72(19), 3120-3123.

Smith, R. M., & Plackner, C. (2009). The family approach to assessing fit in Rasch measurement. Journal of Applied Measurement, 10(4), 424-437.

Smith, R. M., & Taylor, P. (2004). Equating rehabilitation outcome scales: Developing common metrics. Journal of Applied Measurement, 5(3), 229-42.

Stenner, A. J., Fisher, W. P., Jr., Stone, M. H., & Burdick, D. S. (2013, August). Causal Rasch models. Frontiers in Psychology: Quantitative Psychology and Measurement, 4(536), 1-14 [doi: 10.3389/fpsyg.2013.00536].

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010, 29 October). Evidence for a collective intelligence factor in the performance of human groups. Science, 330, 686-688.

Woolley, A. W., & Fuchs, E. (2011, September-October). Collective intelligence in the organization of science. Organization Science, 22(5), 1359-1367.

Wu, M. L., Adams, R. J., Wilson, M. R., Haldane, S.A. (2007). ACER ConQuest Version 2: Generalised item response modelling software. Camberwell: Australian Council for Educational Research.