Archive for May, 2018

Revisiting Hayek’s Relevance to Measurement

May 31, 2018

As so often happens, I’m finding new opportunities for restating what seems obvious to me but does not impact others in the way it ought to. The work of the Austrian economist Friedrich Hayek speaks to me in a particular way that has always, to me, self-evidently expressed ideas of fundamental value and interest. Reviewing his work again lately has opened it up to a new level of detail that is worth sharing here.

Hayek (1948, p. 54) is onto a key point about measurement and its role in economics when he says:

…the spontaneous actions of individuals will, under conditions which we can define, bring about a distribution of resources which can be understood as if it were made according to a single plan, although nobody has planned it…?

Decades of measurement research shows that individuals’ spontaneous responses to assessment and survey questions conform to one another in ways that might appear to have been centrally organized according to a single plan. But over and over again the same patterns are produced with no efforts made to guide or coerce responses that conform in that way.

The results of testing and assessment produced in educational measurement can be expressed in economic terms fitting quite well with Hayek’s observation. Student abilities, economically speaking, are human capital resources. Each student has some amount of ability that can be considered a supply of resources available for application to the demands of the challenges posed by the assessment questions. When assessment data fit a Rasch model, the supply of student abilities have spontaneously organized themselves in relation to challenging demands for that supply of abilities posed by the test questions. The invariant consistency of the data and resulting model fit has not been produced by coercing or guiding the students to respond in a particular way. Although questions can be written to vary in difficulty according to a construct theory, and though educational curricula traditionally vary in difficulty across grade levels, the patterns of growth and change that are observed are plainly not taking place as a result of anyone’s intentions or plans.

This kind of complex adaptive, self-organizing process (Fisher, 2017) describes not just the relations of student abilities and task difficulties, but also the relations of customer preferences to product features, patient health and functionality relative to disease and disability, etc. It also, of course, applies to supply and demand relative to a price (Fisher, 2015). For students, the price to be paid follows from the probability of a supply of ability meeting the demand for it posed by the challenges encountered in assessment items.

Getting back to Hayek (1948, p. 54), here we meet the relevance of the

…central question of all social sciences: How can the combination of fragments of knowledge existing in different minds bring about results which, if they were to be brought about deliberately, would require a knowledge on the part of the directing mind which no single person can possess?

Per Hayek’s point, no one student will know the answers to all of the questions posed in a test, and yet all of the students’ fragments of knowledge combine in a way that bring about results seemingly defined by a single intelligence. It is this bottom up and self-organized emergence of knowledge structures that we capture in measurement and bring into our culture, our sciences, and our economies by bringing things into words and the common languages of standardized metrics.

This spontaneous emergence of structure does not lead directly of its own accord to the creation of markets. Rather, it is vitally important to recognize, along with Miller and O’Leary (2007, p. 710) that:

Markets are not spontaneously generated by the exchange activity of buyers and sellers. Rather, skilled actors produce institutional arrangements, the rules, roles and relationships that make market exchange possible. The institutions define the market, rather than the reverse.

The institutional arrangements we need to make to create efficient markets for human, social, and natural capital will be staggeringly difficult to realize. But a point in time will come when the costs of remaining in our current cultural, political, and economic ruts will be greater, and the benefits will be lower, than the costs and benefits of investing in a new future. That time may be sooner than anyone thinks it will be.

References

Fisher, W. P., Jr. (2015). A probabilistic model of the law of supply and demand. Rasch Measurement Transactions, 29(1), 1508-1511  [http://www.rasch.org/rmt/rmt291.pdf].

Fisher, W. P., Jr. (2017). A practical approach to modeling complex adaptive flows in psychology and social science. Procedia Computer Science, 114, 165-174. Retrieved from https://doi.org/10.1016/j.procs.2017.09.027

Hayek, F. A. (1948). Individualism and economic order. Chicago: University of Chicago Press.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-734.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

My responses to post-IOMW survey questions

May 7, 2018

My definition of objective measurement:

Reproducible invariant intervals embodied in instruments calibrated to shared unit standards explained by substantively meaningful theory. The word ‘objective’ is both redundant, like saying ‘wet rain,’ and unnecessarily exclusive of the shared subjectivity embodied in measuring instruments along with objectivity.

Distinguishing features of IOMW:

Clear focus on technical issues of measurement specifically defined in terms of models in form of natural laws, interval units with known uncertainties, data quality assessments, explanatory theory, substantive interpretation, and metrological traceability of instruments distributed to end users throughout a sociocognitive ecosystem.

Future keynote suggestions:

Luca Mari on measurement philosophy

Leslie Pendrill on metrology

Robert Massof on LOVRNet

Stefan Cano on health metrology consensus standards

Jan Morrison on STEM Learning Ecosystems

Angelica Lips Da Cruz on impact investing

Alan Schwartz on how measurement is revolutionizing philanthropy

Future training session topic suggestions:

Traceability control systems

Electronic distributed ledger systems for tracking learning, health, etc over time and across ecosystem niches

How to create information infrastructures capable of coherently integrating discontinuous levels of complexity, CSCW

How to access and put the wealth of available strictly longitudinal repeated measures of student learning growth to work (see Williamson’s 2016 Berkeley IMEKO paper)

How to integrate universally uniform measures of learning, health, etc in economic models, accounting spreadsheets, TQM/CQI quality improvement methods, outcome product pricing models, and investment finance.

How to approach measurement in terms of complex adaptive self organizing stochastic systems

Other comments:

I want to see a clear justification for any references to IRT. The vast majority of references to IRT at the NY meeting were actually references to measurement theory. If IRT is what is said, IRT ought to be what is meant. None of the major measurement theorists include IRT and they specifically disavow it as offering unidentifiable models, model choice based in p-values instead of principles and meaning, difficult if not impossible estimation problems, no proofs of conjoint additivity or of scores as sufficient statistics, and inconsistent assertions of both crossing ICCs and unidimensionality. IRT is not Measurement Theory. Why is it so widely featured at a measurement conference?

On social impact bonds and critical reflections

May 5, 2018

A new article (Roy, McHugh, & Sinclair, 2018) out this week in the Stanford Social Innovation Review echoes Gleeson-White (2015) in pointing out a disconnect between financial bottom lines and the social missions of companies whose primary objectives concern broader social and environmental impacts. The article also notes the expense of measurement, increased administrative burdens, high transaction costs, technical issues in achieving fair measures, the trend toward the negative implications of managing what is measured instead of advancing the mission, and the potential impacts of external policy environments and political climates.

The authors contend that social impact bonds are popular and proliferating for ideological reasons, not because of any evidence concerning their effectiveness in making the realization of social objectives profitable. Some of the several comments posted online in response to the article take issue with that claim, and point toward evidence of effectiveness. But the general point still stands: more must be done to systematically align investors’ financial interests with the citizens’ interest in advancing their financial, social, and environmental quality of life, and not just with the social service providers’ interest in funding and advancing their mission.

Roy et al. are correct to say that to do otherwise is to turn the people served into commodities. This happens because governance of, accountability for, and reporting of social impacts are shifted away from elected officials to the needs of private funders, with far less in the way of satisfactory recourse for citizens when programs go awry. The problem lies in the failure to create any capacity for individuals themselves to represent, invest in, manage, and profit from their skills, health, trust, and environmental service outcomes. Putting all the relevant information into the hands of service providers and investors, and making that information as low quality as it is, can only ever result in one-sided effects on people themselves. With no idea of the technologies, models, decades of results, and ready examples to draw from in the published research, the authors conclude with a recommendation to leave well enough alone and to pursue more traditional avenues of policy formation, instead of allowing the “cultural supremacy of market principles” to continue advancing into every area of life.

But as is so commonly the case when it comes to technical issues of quantification, the authors’ conclusions and criticisms skip over the essential role that high quality measurement plays in reducing transaction costs and supporting property rights. In general, measurement standards inform easily communicated and transferable information about the quantity and quality of products in markets, thereby lowering transaction costs and enabling rights to the ownership of specific amounts of things. The question that goes unasked in this article, and in virtually every other article in the area of ESG, social impact investing, etc., is this: What kind of measurement technologies and systems would we need to be able to replicate existing market efficiencies in new markets for human, social, and natural capital?

That question and other related ones are, of course, the theme of this blog and of many of my publications. Further exploration here and in the references to other posts (such as Fisher, 2011, 2012a, 2012b) may prove fruitful to others seriously interested in finding a way out of the unexamined assumptions stifling creativity in this area.

In short, instead of turning people into commodities, why should we not turn skills, health, trust, and environmental services into commodities? Why should not every person have legal title to scientifically and uniformly measured numbers of shares of each essential form of human, social, and natural capital? Why should individuals not be able to profit in both monetary and personal terms from their investments in education, health care, community, and the environment? Why should we allow corporations to continue externalizing the costs of social and environmental investments, at the expense of individual citizens and communities? Why is there so much disparity and inequality in the opportunities for skill development and healthy lives available across social sectors?

Might not our inability to obtain good information about processes and outcomes in the domains of educational, health care, social service, and environmental management have a lot to do with it? Why don’t we have the information infrastructure we need, when the technology for creating it has been in development for over 90 years? Why are there so many academics, researchers, philanthropic organizations, and government agencies that are content with the status quo when these longstanding technologies are available, and people, communities, and the environment are suffering from the lack of the information they ought to have?

During the French revolution, one of the primary motivations for devising the metric system was to extend the concept of universal rights to individual commercial exchanges. The confusing proliferation of metrics in Europe at the time made it possible for merchants and the nobility to sell in one unit and buy with another. Universal rights plainly implied universal measures. Alder (2002, p. 2) explains that:

“To do their job, standards must operate as a set of shared assumptions, the unexamined background against which we strike agreements and make distinctions. So it is not surprising that we take measurement for granted and consider it banal. Yet the use a society makes of its measures expresses its sense of fair dealing. That is why the balance scale is a widespread symbol of justice. .. Our methods of measurement define who we are and what we value.”

Getting back to the article by Roy, McHugh, and Sinclair, yes, it is true that the measures in use in today’s social impact bonds are woefully inadequate. Far from living up to the kind of justice symbolized by the balance scale, today’s social impact measures define who we are in terms of units of measurement that differ and change in unknown ways across individuals, over time, and across instruments. This is the reason for many, if not all, of the problems Roy et al. find with social impact bonds: their measures are not up to the task.

But instead of taking that as an unchangeable given, should not we do more to ask what kinds of measures could do the job that needs to be done? Should not we look around and see if in fact there might be available technologies able to advance the cause?

Theory and evidence have, in fact, been brought to bear in formulating approaches to instrument calibration that reproduce the balance scale’s fair and just comparisons of weight from data like that from tests and surveys (Choi, 1998; Massof, 2011; Rasch, 1960, pp. 110-115). The same thing has been done in reproducing measures of length (Stephanou & Fisher, 2013), distance (Moulton, 1993), and density (Pelton & Bunderson, 2003).

These are not isolated and special results. The methods involved have been in use for decades and in dozens of fields (Wright, 1968, 1977, 1999; Wright & Masters, 1982; Wright & Stone, 1979, 1999; Andrich, 1978, 1988, 1989, 2010; Bond & Fox, 2015; Engelhard, 2012; Wilson, 2005; Wilson & Fisher, 2017). Metric system engineers and physicists are in accord with psychometricians as to the validity of these claims (Pendrill & Fisher, 2015) and are on the record with positive statements of support:

“Rasch models belong to the same class that metrologists consider paradigmatic of measurement” (Mari and Wilson, 2014, p. 326).

“The Rasch approach…is not simply a mathematical or statistical approach, but instead [is] a specifically metrological approach to human-based measurement” (Pendrill, 2014, p. 26).

These statements represent the attitude toward measurement possibilities being applied by at least one effort in the area of social impact investing (https://www.aldcpartnership.com/#/cases/financing-the-future). Hopefully, there will be many more projects of this kind emerging in the near future.

The challenges are huge, of course. This is especially the case when considering the discontinuous levels of complexity that have to be negotiated in making information flow across locally situated individual niches, group-level organizations and communities, and global accountability applications (Fisher, 2017; Fisher, Oon, & Benson, 2018; Fisher & Stenner, 2018). But taking on these challenges makes far more sense than remaining complicitly settled in a comfortable rut, throwing up our hands at how unfair life is.

There’s a basic question that needs to be asked. If what is presented as measurement raises transaction costs and does not support ownership rights to what is measured, is it really measurement? How can the measurement of kilowatts, liters, and grams lower transaction costs and support property rights at the same time that other so-called measurements raise transaction costs and do not support property rights? Does not this inconsistency suggest something might be amiss in the way measurement is conceived in some areas?

For more info, check out these other posts here:

https://livingcapitalmetrics.wordpress.com/2015/05/01/living-capital-metrics-for-financial-and-sustainability-accounting-standards/

https://livingcapitalmetrics.wordpress.com/2014/11/08/another-take-on-the-emerging-paradigm-shift/

https://wordpress.com/post/livingcapitalmetrics.wordpress.com/1812

https://wordpress.com/post/livingcapitalmetrics.wordpress.com/497

References

Alder, K. (2002). The measure of all things: The seven-year odyssey and hidden error that transformed the world. New York: The Free Press.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.

Andrich, D. (1988). Sage University Paper Series on Quantitative Applications in the Social Sciences. Vol. series no. 07-068: Rasch models for measurement. Beverly Hills, California: Sage Publications.

Andrich, D. (1989). Constructing fundamental measurements in social psychology. In J. A. Keats, R. Taft, R. A. Heath & S. H. Lovibond (Eds.), Mathematical and theoretical systems. Proceedings of the 24th International Congress of Psychology of the International Union of Psychological Science, Vol. 4 (pp. pp. 17-26). Amsterdam, Netherlands: North-Holland.

Andrich, D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika, 75(2), 292-308.

Bond, T., & Fox, C. (2015). Applying the Rasch model: Fundamental measurement in the human sciences, 3d edition. New York: Routledge.

Choi, E. (1998). Rasch invents “ounces.” Popular Measurement, 1(1), 29. Retrieved from https://www.rasch.org/pm/pm1-29.pdf

Engelhard, G., Jr. (2012). Invariant measurement: Using Rasch models in the social, behavioral, and health sciences. New York: Routledge Academic.

Fisher, W. P., Jr. (2011). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 12(1), 49-66.

Fisher, W. P., Jr. (2012a). Measure and manage: Intangible assets metric standards for sustainability. In J. Marques, S. Dhiman & S. Holt (Eds.), Business administration education: Changes in management and leadership strategies (pp. 43-63). New York: Palgrave Macmillan.

Fisher, W. P., Jr. (2012b, May/June). What the world needs now: A bold plan for new standards [Third place, 2011 NIST/SES World Standards Day paper competition]. Standards Engineering, 64(3), 1 & 3-5 [http://ssrn.com/abstract=2083975].

Fisher, W. P., Jr. (2017). A practical approach to modeling complex adaptive flows in psychology and social science. Procedia Computer Science, 114, 165-174. Retrieved from https://doi.org/10.1016/j.procs.2017.09.027

Fisher, W. P., Jr., Oon, E. P.-T., & Benson, S. (2018). Applying Design Thinking to systemic problems in educational assessment information management. Journal of Physics Conference Series, pp. in press; [http://media.imeko-tc7-rio.org.br/media/uploads/s/wfisher@berkeley.edu_1497049869_781396.pdf].

Fisher, W. P., Jr., & Stenner, A. J. (2018). Ecologizing vs modernizing in measurement and metrology. Journal of Physics Conference Series, pp. in press [http://media.imeko-tc7-rio.org.br/media/uploads/s/wfisher@berkeley.edu_1496875919_204672.pdf].

Gleeson-White, J. (2015). Six capitals, or can accountants save the planet? Rethinking capitalism for the 21st century. New York: Norton.

Mari, L., & Wilson, M. (2014, May). An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315-327.

Massof, R. W. (2011). Understanding Rasch and Item Response Theory models: Applications to the estimation and validation of interval latent trait measures from responses to rating scale questionnaires. Ophthalmic Epidemiology, 18(1), 1-19.

Moulton, M. (1993). Probabilistic mapping. Rasch Measurement Transactions, 7(1), 268 [http://www.rasch.org/rmt/rmt71b.htm].

Pelton, T., & Bunderson, V. (2003). The recovery of the density scale using a stochastic quasi-realization of additive conjoint measurement. Journal of Applied Measurement, 4(3), 269-281.

Pendrill, L. (2014, December). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33.

Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roy, M. J., McHugh, N., & Sinclair, S. (2018, 1 May). A critical reflection on social impact bonds. Stanford Social Innovarion Review. Retrieved 5 May 2018, from https://ssir.org/articles/entry/a_critical_reflection_on_social_impact_bonds?utm_source=Enews&utm_medium=Email&utm_campaign=SSIR_Now&utm_content=Title.

Stephanou, A., & Fisher, W. P., Jr. (2013). From concrete to abstract in the measurement of length. Journal of Physics Conference Series, 459, http://iopscience.iop.org/1742-6596/459/1/012026.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wilson, M., & Fisher, W. (2017). Psychological and social measurement: The career and contributions of Benjamin D. Wright. New York: Springer.

Wright, B. D. (1968). Sample-free test calibration and person measurement. In Proceedings of the 1967 invitational conference on testing problems (pp. 85-101 [http://www.rasch.org/memo1.htm]). Princeton, New Jersey: Educational Testing Service.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1979). Best test design: Rasch measurement. Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/measess/me-all.pdf].