Archive for the ‘measurement’ Category

Self-Sustaining Sustainability, Once Again, Already

August 12, 2018

The urgent need for massive global implementations of sustainability policies and practices oddly and counterproductively has not yet led to systematic investments in state of the art sustainability metric standards. My personal mission is to contribute to meeting this need. Longstanding, proven resources in the art and science of precision instrumentation calibration and explanatory theory are available to address these problems. In the same way technical standards for measuring length, mass, volume, time, energy, light, etc. enable the coordination of science and commerce for manufactured capital and property, so, too, will a new class of standards for measuring human, social, and natural capital.

This new art and science contradicts common assumptions in three ways. First, contrary to popular opinion that measuring these things is impossible, over 90 years of research and practice support a growing consensus among weights and measures standards engineers (metrologists) and social and psychological measurement experts that relevant unit standards are viable, feasible, and desirable.

Common perceptions are contradicted in a second way in that measurement of this kind does not require reducing human individuality to homogenized uniform sameness. Instead of a mechanical metaphor of cogs in a machine, the relevant perspective is an organic or musical one. The goal is to ensure that local uniqueness and creative improvisations are freely expressed in a context informed by shared standards (like DNA, or a musical instrument tuning system).

The third way in which much of what we think we know is mistaken concerns how to motivate adoption of sustainability policies and practices. Many among us are fearful that neither the general population nor its leaders in government and business care enough about sustainability to focus on implementing solutions. But finding the will to act is not the issue. The problem is how to create environments in which new sustainable forms of life multiply and proliferate of their own accord. To do this, people need means for satisfying their own interests in life, liberty, and the pursuit of happiness. The goal, therefore, is to organize knowledge infrastructures capable of informing and channeling the power of individual self-interest. The only way mass scale self-sustaining sustainable economies will ever happen is by tapping the entrepreneurial energy of the profit motive, where profit is defined not just in financial terms but in the quality of life and health terms of authentic wealth and genuine productivity.

We manage what we measure. If we are to collectively, fluidly, efficiently, and innovatively manage the living value of our human, social, and natural capital, we need, first, high quality information expressed in shared languages communicating that value. Second, we need, to begin with, new scientific, legal, economic, financial, and governmental institutions establishing individual rights to ownership of that value, metric units expressing amounts of that value, conformity audits for ascertaining the accuracy and precision of those units, financial alignments of the real value measured with bankable dollar amounts, and investment markets to support entrepreneurial innovations in creating that value.

The end result of these efforts will be a capacity for all of humanity to pull together in common cause to create a sustainable future. We will each be able to maximize our own personal potential at the same time we contribute to the greater good. We will not only be able to fulfill the potential of our species as stewards of the earth, we will have fun doing it! For technical information resources, see below. PDFs are available on request, and can often be found freely available online.

Self-Sustaining Sustainability

Relevant Information Resources

William P. Fisher, Jr., Ph.D.

Barney, M., & Fisher, W. P., Jr. (2016). Adaptive measurement and assessment. Annual Review of Organizational Psychology and Organizational Behavior, 3, 469-490.

Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Jr. (1999). Foundations for health status metrology: The stability of MOS SF-36 PF-10 calibrations across samples. Journal of the Louisiana State Medical Society, 151(11), 566-578.

Fisher, W. P., Jr. (2000). Objectivity in psychosocial measurement: What, why, how. Journal of Outcome Measurement, 4(2), 527-563.

Fisher, W. P., Jr. (2002). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2003). The mathematical metaphysics of measurement and metrology: Towards meaningful quantification in the human sciences. In A. Morales (Ed.), Renascent pragmatism: Studies in law and social science (pp. 118-153). Brookfield, VT: Ashgate Publishing Co.

Fisher, W. P., Jr. (2004). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy & Social Sciences, 27(4), 429-454.

Fisher, W. P., Jr. (2007). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-1093 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2009, November 19). Draft legislation on development and adoption of an intangible assets metric system. Living Capital Metrics blog: https://livingcapitalmetrics.wordpress.com/2009/11/19/draft-legislation/.

Fisher, W. P., Jr. (2009). Invariance and traceability for measures of human, social, and natural capital. Measurement, 42(9), 1278-1287.

Fisher, W. P., Jr. (2009). NIST Critical national need idea White Paper: metrological infrastructure for human, social, and natural capital (http://www.nist.gov/tip/wp/pswp/upload/202_metrological_infrastructure_for_human_social_natural.pdf). Washington, DC: National Institute for Standards and Technology.

Fisher, W. P., Jr. (2010, 22 November). Meaningfulness, measurement, value seeking, and the corporate objective function: An introduction to new possibilities. LivingCapitalMetrics.com, Sausalito, California.

Fisher, W. P., Jr. (2010). Measurement, reduced transaction costs, and the ethics of efficient markets for human, social, and natural capital. Bridge to Business Postdoctoral Certification, Freeman School of Business, Tulane University (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2340674).

Fisher, W. P., Jr. (2010). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics Conference Series, 238(1), 012016.

Fisher, W. P., Jr. (2011). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In N. Brown, B. Duckor, K. Draney & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 2 (pp. 1-27). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2012). Measure and manage: Intangible assets metric standards for sustainability. In J. Marques, S. Dhiman & S. Holt (Eds.), Business administration education: Changes in management and leadership strategies (pp. 43-63). New York: Palgrave Macmillan.

Fisher, W. P., Jr. (2012). What the world needs now: A bold plan for new standards [Third place, 2011 NIST/SES World Standards Day paper competition]. Standards Engineering, 64(3), 1 & 3-5 [http://ssrn.com/abstract=2083975].

Fisher, W. P., Jr. (2015). A probabilistic model of the law of supply and demand. Rasch Measurement Transactions, 29(1), 1508-1511 [http://www.rasch.org/rmt/rmt291.pdf].

Fisher, W. P., Jr. (2015). Rasch measurement as a basis for metrologically traceable standards. Rasch Measurement Transactions, 28(4), 1492-1493 [http://www.rasch.org/rmt/rmt284.pdf].

Fisher, W. P., Jr. (2015). Rasch metrology: How to expand measurement locally everywhere. Rasch Measurement Transactions, 29(2), 1521-1523.

Fisher, W. P., Jr. (2017, September). Metrology, psychometrics, and new horizons for innovation. 18th International Congress of Metrology, Paris, 10.1051/metrology/201709007.

Fisher, W. P., Jr. (2017). A practical approach to modeling complex adaptive flows in psychology and social science. Procedia Computer Science, 114, 165-174.

Fisher, W. P., Jr. (2018). How beauty teaches us to understand meaning. Educational Philosophy and Theory, in review.

Fisher, W. P., Jr. (2018). Separation theorems in econometrics and psychometrics: Rasch, Frisch, two Fishers, and implications for measurement. Scandinavian Economic History Review, in review.

Fisher, W. P., Jr., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5(1), 3-25.

Fisher, W. P., Jr., Harvey, R. F., Taylor, P., Kilgore, K. M., & Kelly, C. K. (1995). Rehabits: A common language of functional assessment. Archives of Physical Medicine and Rehabilitation, 76(2), 113-122.

Fisher, W. P., Jr., & Stenner, A. J. (2011, January). Metrology for the social, behavioral, and economic sciences (Social, Behavioral, and Economic Sciences White Paper Series).National Science Foundation: http://www.nsf.gov/sbe/sbe_2020/submission_detail.cfm?upld_id=36.

Fisher, W. P., Jr., & Stenner, A. J. (2011, August 31 to September 2). A technology roadmap for intangible assets metrology. In Fundamentals of measurement science. International Measurement Confederation (IMEKO) TC1-TC7-TC13 Joint Symposium, http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24493/ilm1-2011imeko-018.pdf, Jena, Germany.

Fisher, W. P., Jr., & Stenner, A. J. (2016). Theory-based metrological traceability in education: A reading measurement network. Measurement, 92, 489-496.

Fisher, W. P., Jr., & Wilson, M. (2015). Building a productive trading zone in educational assessment research and practice. Pensamiento Educativo: Revista de Investigacion Educacional Latinoamericana, 52(2), 55-78.

Pendrill, L., & Fisher, W. P., Jr. (2013). Quantifying human response: Linking metrological and psychometric characterisations of man as a measurement instrument. Journal of Physics Conference Series, 459, 012057.

Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.

 

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

Self-Sustaining Sustainability

August 6, 2018

After decades of efforts and massive resources expended in trying to create a self-sustaining sustainable economy, perhaps it is time to wonder if we are going about it the wrong way. There seems to be truly significant and widespread desire for change, but the often inspiring volumes of investments and ingenuity applied to the problem persistently prove insufficient to the task. Why?

I’ve previously and repeatedly explained how finding the will to change is not the issue. This time I’ll approach my proposed solution in a different way.

Q: How do we create a self-sustaining sustainable economy?

A: By making sustainability profitable in monetary terms as well as in the substantive real terms of the relationships we live out with each other and the earth. Current efforts in this regard focus solely on reducing energy costs enough to compensate for investments in advancing the organizational mission. We need far more comprehensively designed solutions than that.

Q: How do we do that?

A: By financially rewarding improved sustainability at every level of innovation, from the individual to the community to the firm.

Q: How do we do that?

A: By instituting rights to the ownership of human, social, and natural capital properties, and by matching the demand for sustainability with the supply of it in a way that will inform arbitrage and pricing.

Q: How do we do that?

A: By lowering the cost of the information needed to be able to know how many shares of human, social, and natural capital stocks are owned, and to match demand with supply.

Q: How could that be done?

A: By investing as a society in improving the quality and distribution of the available information.

Q: What does that take?

A: Creating dependable and meaningful tools for ascertaining the quantity, quality, and type of sustainability impacts on human, social, and natural capital being offered.

Q: Can that be done?

A: The technical art and science of measurement needed for creating these tools is well established, having been in development for almost 100 years.

Q: How do we start?

A: An important lesson of history is that building the infrastructure and its array of applications follows in the wake of, and cannot precede, the institution of the constitutional ideals. We must know what the infrastructure and applications will look like in their general features, but nothing will ever be done if we think we have to have them in place before instantiating the general frame of reference. The most general right to own legal title to human, social, and natural capital can be instituted, and the legal status of new metric system units can be established, before efforts are put into unit standards, traceability processes, protocols for intralaboratory ruggedness tests and interlaboratory round robin trials, conformity assessments, etc.

Q: It sounds like an iterative process.

A: Yes, one that must attend from the start to the fundamental issues of information coherence and complexity, as is laid out in my recent work with Emily Oon, Spencer Benson, Jack Stenner, and others.

Q: This sounds highly technical, utilitarian, and efficient. But all the talk of infrastructure, standards, science, and laboratories sounds excessively technological. Is there any place in this scheme for ecological values, ethics, and aesthetics? And how are risk and uncertainty dealt with?

A: We can take up each of these in turn.

Ecological values: To use an organic metaphor, we know the DNA of the various human, social, and natural capital forms of life, or species, and we know their reproductive and life cycles, and their ecosystem requirements. What we have not done is to partner with each of these species in relationships that focus on maximizing the quality of their habitats, their maturation, and the growth of their populations. Social, psychological, and environmental relationships are best conceived as ecosystems of mutual interdependencies. Being able to separate and balance within-individual, between-individual, and collective levels of complexity in these interdependencies will be essential to the kinds of steward leadership needed for creating and maintaining new sociocognitive ecosystems. Our goal here is to become the change we want to institute, since caterpillar to butterfly metamorphoses come about only via transformations from within.

Ethics: The motivating intention is to care simultaneously and equally effectively for both individual uniqueness and global humanity. In accord with the most fundamental ethical decision, we choose discourse over violence, and we do so by taking language as the model for how things come into words. Language is itself alive in the sense of the collective processes by which new meanings come into it. Language moreover has the remarkable capacity of supporting local concrete improvisations and creativity at the same time that it provides navigable continuity and formal ideals. Care for the unity and sameness of meaning demands a combination of rigorous conceptual determinations embodied in well-defined words with practical applications of those words in local improvisations. That is how we support the need to make decisions with inevitably incomplete and inconsistent information while not committing the violence of the premature conclusion. The challenge is one of finding a balance between openness and boundaries that allows language and our organizational cultures to be stable while also evolving. Our technical grasp of complex adaptive systems, autopoiesis, and stochastic measurement information models is advanced enough to meet these ethical requirements of caring for ourselves, each other, and the earth.

Aesthetics: An aesthetic desire for and love of beauty roots the various forms of life inhabiting diverse niches in the proposed knowledge ecosystem and information infrastructure, and does so in the ground of the ethical choice of discourse and meaning over violence. The experience of beauty teaches us how to understand meaning. The attraction to beauty is a unique human phenomenon because it combines apparent opposites into a single complex feeling. Even when the object of desire is possessed as fully as possible, desire is not eliminated, and even when one feels the object of desire to be lost or completely out of touch, its presence and reality is still felt. So, too, with meaning: no actual instance of anything in the world ever embodies the fullness of an abstract conceptual ideal. This lesson of beauty is perhaps most plainly conveyed in music, where artists deliberately violate the standards of instrument tuning to create fascinating and absorbing combinations of harmony and dissonance from endlessly diverse ensembles. Some tunings persist beyond specific compositions to become immediately identifiable trademark sounds. In taking language as a model, the aesthetic combination of desire and possession informs the ethics of care for the unity and sameness of meaning, and vice versa. And ecological values, ethics, and aesthetics stand on par with the technical concerns of calibration and measurement.

Risk and uncertainty: Calibrating a tool relative to a unit standard is by itself already a big step toward reducing uncertainty and risk. Instead of the chaos of dozens of disconnected sustainability indicators, or the cacophony of hundreds or thousands of different tests, assessments, or surveys measuring the same things, we will have data and theory supporting interpretation of reproducible patterns. These patterns will be, and in many cases already are, embodied in instruments that further reduce risk by defining an invariant unit of comparison, simplifying interpretation, reducing opportunities for mistakes, by quantifying uncertainty, and by qualifying it in terms of the anomalous exceptions that depart from expectations. Each of these is a special feature of rigorously defined measurement that will eventually become the expected norm for information on sustainability.

For more on these themes, see my other blog posts here, my various publications, and my SSRN page.

 

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Evaluating Questionnaires as Measuring Instruments

June 23, 2018

An email came in today asking whether three different short (4- and 5-item) questionnaires could be expected to provide reasonable quality measurement. Here’s my response.

—–

Thanks for raising this question. The questionnaire plainly was not designed to provide data suitable for measurement. Though much can be learned about making constructs measurable from data produced by this kind of questionnaire, “Rasch analysis” cannot magically create a silk purse from a sow’s ear (as the old expression goes). Use Linacre’s (1993) generalizability theory nomograph to see what reliabilities are expected for each subscale, given the numbers of items and rating categories, and applying a conservative estimate of the adjusted standard deviations (1.0 logit, for instance). Convert the reliability coefficients into strata (Fisher, 1992, 2008; Wright & Masters, 1982, pp. 92, 105-106) to make the practical meaning of the precision obtained obvious.

So if you have data, analyze it and compare the expected and observed reliabilities. If the uncertainties are quite different, is that because of targeting issues? But before you do that, ask experts in the area to rank order:

  • the courses by relevance to the job;
  • the evaluation criteria from easy to hard; and
  • the skills/competencies in order of importance to job performance.

Then study the correspondence between the rankings and the calibration results. Where do they converge and diverge? Why? What’s unexpected? What can be learned?

Analyze all of the items in each area (student, employer, instructor) together in Winsteps and study each of the three tables 23.x, setting PRCOMP=S. Remember that the total variance explained is not interpreted simply in terms of “more is better” and that the total variance explained is not as important as the ratio of that variance to the variance in the first contrast (see Linacre, 2006, 2008). If the ratio is greater than 3, the scale is essentially unidimensional (though significant problems may remain to be diagnosed and corrected).

Common practice holds that unexplained variance eigenvalues should be less than 1.5, but this overly simplistic rule of thumb (Chou & Wang, 2010; Raîche, 2005) has been contradicted in practice many times, since, even if one or more eigenvalues are over 1.5, theory may say the items belong to the same construct, and the disattenuated correlations of the measures implied by the separate groups of items (provided in tables 23.x) may still approach 1.00, indicating that the same measures are produced across subscales. See Green (1996) and Smith (1996), among others, for more on this.

If subscales within each of the three groups of items are markedly different in the measures they produce, then separate them in different analyses. If these further analyses reveal still more multidimensionalities, it’s time to go back to the drawing board, given how short these scales are. If you define a plausible scale, study the item difficulty orders closely with one or more experts in the area. If there is serious interest in precision measurement and its application to improved management, and not just a bureaucratic need for data to satisfy empty demands for a mere appearance of quality assessment, then trace the evolution of the construct as it changes from less to more across the items.

What, for instance, is the common theme addressed across the courses that makes them all relevant to job performance? The courses were each created with an intention and they were brought together into a curriculum for a purpose. These intentions and purposes are the raw material of a construct theory. Spell out the details of how the courses build competency in translation.

Furthermore, I imagine that this curriculum, by definition, was set up to be effective in training students no matter who is in the courses (within the constraints of the admission criteria), and no matter which particular challenges relevant to job performance are sampled from the universe of all possible challenges. You will recognize these unexamined and unarticulated assumptions as what need to be explicitly stated as hypotheses informing a model of the educational enterprise. This model transforms implicit assumptions into requirements that are never fully satisfied but can be very usefully approximated.

As I’ve been saying for a long time (Fisher, 1989), please do not accept the shorthand language of references to “the Rasch model”, “Rasch scaling”, “Rasch analysis”, etc. Rasch did not invent the form of these models, which are at least as old as Plato. And measurement is not a function of data analysis. Data provide experimental evidence testing model-based hypotheses concerning construct theories. When explanatory theory corroborates and validates data in calibrated instrumentation, the instrument can be applied at the point of use with no need for data analysis, to produce measures, uncertainty (error) estimates, and graphical fit assessments (Connolly, Nachtman, & Pritchett, 1971; Davis, et al., 2008; Fisher, 2006; Fisher, Kilgore, & Harvey, 1995; Linacre, 1997; many others).

So instead of using those common shorthand phrases, please speak directly to the problem of modeling the situation in order to produce a practical tool for managing it.

Further information is available in the references below.

 

Aryadoust, S. V. (2009). Mapping Rasch-based measurement onto the argument-based validity framework. Rasch Measurement Transactions, 23(1), 1192-3 [http://www.rasch.org/rmt/rmt231.pdf].

Chang, C.-H. (1996). Finding two dimensions in MMPI-2 depression. Structural Equation Modeling, 3(1), 41-49.

Chou, Y. T., & Wang, W. C. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70, 717-731.

Connolly, A. J., Nachtman, W., & Pritchett, E. M. (1971). Keymath: Diagnostic Arithmetic Test. Circle Pines, Minnesota: American Guidance Service. Retrieved 23 June 2018 from https://images.pearsonclinical.com/images/pa/products/keymath3_da/km3-da-pub-summary.pdf

Davis, A. M., Perruccio, A. V., Canizares, M., Tennant, A., Hawker, G. A., Conaghan, P. G. et al. (2008, May). The development of a short measure of physical function for hip OA HOOS-Physical Function Shortform (HOOS-PS): An OARSI/OMERACT initiative. Osteoarthritis Cartilage, 16(5), 551-559.

Fisher, W. P., Jr. (1989). What we have to offer. Rasch Measurement Transactions, 3(3), 72 [http://www.rasch.org/rmt/rmt33d.htm].

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238  [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2006). Survey design recommendations [expanded from Fisher, W. P. Jr. (2000) Popular Measurement, 3(1), pp. 58-59]. Rasch Measurement Transactions, 20(3), 1072-1074 [http://www.rasch.org/rmt/rmt203.pdf].

Fisher, W. P., Jr. (2008). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-1163 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5(1), 3-25.

Green, K. E. (1996). Dimensional analyses of complex data. Structural Equation Modeling, 3(1), 50-61.

Linacre, J. M. (1993). Rasch-based generalizability theory. Rasch Measurement Transactions, 7(1), 283-284; [http://www.rasch.org/rmt/rmt71h.htm].

Linacre, J. M. (1997). Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation State of the Art Reviews, 11(2), 315-324 [http://www.rasch.org/memo60.htm].

Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2(3), 266-83.

Linacre, J. M. (1998). Structure in Rasch residuals: Why principal components analysis? Rasch Measurement Transactions, 12(2), 636 [http://www.rasch.org/rmt/rmt122m.htm].

Linacre, J. M. (2003). PCA: Data variance: Explained, modeled and empirical. Rasch Measurement Transactions, 17(3), 942-943 [http://www.rasch.org/rmt/rmt173g.htm].

Linacre, J. M. (2006). Data variance explained by Rasch measures. Rasch Measurement Transactions, 20(1), 1045 [http://www.rasch.org/rmt/rmt201a.htm].

Linacre, J. M. (2008). PCA: Variance in data explained by Rasch measures. Rasch Measurement Transactions, 22(1), 1164 [http://www.rasch.org/rmt/rmt221j.htm].

Raîche, G. (2005). Critical eigenvalue sizes in standardized residual Principal Components Analysis. Rasch Measurement Transactions, 19(1), 1012 [http://www.rasch.org/rmt/rmt191h.htm].

Schumacker, R. E., & Linacre, J. M. (1996). Factor analysis and Rasch. Rasch Measurement Transactions, 9(4), 470 [http://www.rasch.org/rmt/rmt94k.htm].

Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3(2), 205-31.

Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3(1), 25-40.

Wright, B. D. (1996). Comparing Rasch measurement and factor analysis. Structural Equation Modeling, 3(1), 3-24.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, Illinois: MESA Press.

What is the point of sustainability impact investing?

June 10, 2018

What if the sustainability impact investing problem is not just a matter of judiciously supporting business policies and practices likely to enhance the long term viability of life on earth? What if the sustainability impact investing problem is better conceived in terms of how to create markets that function as self-sustaining ecosystems of diverse forms of economic life?

The crux of the sustainability problem from this living capital metrics point of view is how to create efficient markets for virtuous cycles of productive value creation in the domains of human, social, and natural capital. Mainstream economics deems this an impossible task because its definition of measurement makes trade in these forms of capital unethical and immoral forms of slavery.

But what if there is another approach to measurement? What if this alternative approach is scientific in ways unimagined in mainstream economics? What if this alternative approach has been developing in research and practice in education, psychology, health care, sociology, and other fields for over 90 years? What if there are thousands of peer-reviewed publications supporting its validity and reliability? What if a wide range of commercial firms have been successfully employing this alternative approach to measurement for decades? What if this alternative approach has been found legally and scientifically defensible in ways other approaches have not? What if this alternative approach enables us to be better stewards of our lives together than is otherwise possible?

Put another way, measuring and managing sustainability is fundamentally a problem of harmonizing relationships. What do we need to harmonize our relationships with each other, between our communities and nations, and with the earth? How can we achieve harmonization without forcing conformity to one particular scale? How can we tune the instruments of a sustainability art and science to support as wide a range of diverse ensembles and harmonies as exists in music?

Positive and hopeful answers to these questions follow from the fact that we have at our disposal a longstanding, proven, and advanced art and science of qualitatively rich measurement and instrument calibration. The crux of the message is that this art and science is poised to be the medium in which sustainability impact investing and management fulfills its potential and transforms humanity’s capacities to care for itself and the earth.

The differences between the quality of information that is available, and the quality of information currently in use in sustainability impact investing, are of such huge magnitudes that they can only be called transformative. Love and care are the power behind these transformative differences. Choosing discourse over violence, considerateness for the vulnerabilities we share with others, and care for the unity and sameness of meaning in dialogue are all essential to learning the lesson Diotima taught Socrates in Plato’s Symposium. These lessons can all be brought to bear in creating the information and communications systems we need for sustainable economies.

The current world of sustainability impact investing’s so-called metrics lead to widespread complaints of increased administrative and technical burdens, and resulting distractions that lead away from pursuit of the core social mission. The maxim, “you manage what you measure,” becomes a cynical commentary on red tape and bureaucracy instead of a commendable use of tools fit for purpose.

In contrast with the cumbersome and uninterpretable masses of data that pass for sustainability metrics today, the art and science of measurement establishes the viability and feasibility of efficient markets for human, social, and natural capital. Instead of counting paper clips in mindless accounting exercises, we can instead be learning what comes next in the formative development of a student, a patient, an employee, a firm, a community, or the ecosystem services of watersheds, forests, and fisheries.

And we can moreover support success in those developments by means of information flows that indicate where the biggest per-dollar human, social, and natural capital value returns accrue. Rigorous measurability of those returns will make it possible to price them, to own them, to make property rights legally enforceable, and to thereby align financial profits with the creation of social value. In fact, we could and should set things up so that it will be impossible to financially profit without creating social value. When that kind of system of incentives and rewards is instituted, then the self-sustaining virtuous cycle of a new ecological economy will come to life.

Though the value and originality of the innovations making this new medium possible are huge, in the end there’s really nothing new under the sun. As the French say, “plus ça change, plus c’est la même chose.” Or, as Whitehead put it, philosophically, the innovations in measurement taking hold in the world today are nothing more than additional footnotes to Plato. Contrary to both popular and most expert opinion, it turns out that not only is a moral and scientific art of human measurement possible, Plato’s lessons on how experiences of beauty teach us about meaning provide what may well turn out to be the only way today’s problems of human suffering, social discontent, and environmental degradation will be successfully addressed.

We are faced with a kind of Chinese finger-puzzle: the more we struggle, the more trapped we become. Relaxing into the problem and seeing the historical roots of scientific reasoning in everyday thinking opens our eyes to a new path. Originality is primarily a matter of finding a useful model no one else has considered. A long history of innovations come together to point in a new direction plainly recognizable as a variation on an old theme.

Instead of a modern focus on data and evidence, then, and instead of the postmodern focus on the theory-dependence of data, we are free to take an unmodern focus on how things come into language. The chaotic complexity of that process becomes manageable as we learn to go with the flow of adaptive evolving processes stable enough to support meaningful communication. Information infrastructures in this linguistic context are conceived as ecosystems alive to changeable local situations at the same time they do not compromise continuity and navigability.

We all learn through what we already know, so it is essential that we begin from where we are at. Our first lessons will then be drawn from existing sustainability impact data, using the UN SDG 17 as a guide. These data were not designed from the principles of scientifically rigorous measurement, but instead assume that separately aggregated counts of events, percentages, and physical measures of volume, mass, or time will suffice as measures of sustainability. Things that are easy to count are not, however, likely to work as satisfactory measures. We need to learn from the available data to think again about what data are necessary and sufficient to the task.

The lessons we will learn from the data available today will lead to more meaningful and rigorous measures of sustainability. Connecting these instruments together by making them metrologically traceable to standard units, while also illuminating local unique data patterns, in widely accessible multilevel information infrastructures is the way in which we will together work the ground, plant the seeds, and cultivate new diverse natural settings for innovating sustainable relationships.

 

Measurement and markets

June 3, 2018

My response to a request for discussion topic suggestions from Alain Leplege and David Andrich to be taken up at the Rasch Expert Group meeting in Paris on 26 June:

The role of measurement in reducing economic transaction costs and in establishing legal property rights is well established. The value and importance of measurement is stressed everywhere, in all fields. But where measuring physical, chemical, and biological variables contributes to lower transaction costs and defensible property rights, measuring psychological, social, and environmental variables increases administrative and technical burdens with no impact at all on property rights. Why is this?

Furthermore, when physical, chemical, and biological variables are objectively measurable, no one develops their own instruments, units, internal measurement systems, or the things measured by those systems. Instead, they purchase those tools and products in open markets.

But when psychological, social, and environmental variables are objectively measured, as they have been for many decades, everyone still assumes they must develop their own instruments, units, internal measurement systems, and the things measured by those systems, instead of purchasing those tools and products in open markets. Why is this?

I propose that the answers to both these questions follow from two widely assumed misconceptions about markets and measurement.

The first misconception concerns how markets are formed. As explained by Miller and O’Leary (2007, p. 721):

“Markets are not spontaneously generated by the exchange activity of buyers and sellers. Rather, skilled actors produce institutional arrangements, the rules, roles and relationships that make market exchange possible. The institutions define the market, rather than the reverse.“

North (1981, pp. 18-19, 36), one of the founders of the new institutional economics, elaborates further:

“…without some form of measurement, property rights cannot be established nor exchange take place.”

“One must be able to measure the quantity of a good in order for it to be exclusive property and to have value in exchange. Where measurement costs are very high, the good will be a common property resource. The technology of measurement and the history of weights and measures is a crucial part of economic history since as measurement costs were reduced the cost of transacting was reduced.“

Benham and Benham (2000, p. 370) concur:

“Economic theory suggests that changes in transaction costs have a first-order impact on the production frontier. Lower transaction costs mean more trade, greater specialization, changes in production costs, and increased output.”

The second misconception, concerning measurement, stems from the assumption that the widely used incomplete and insufficient methods based in True Score Theory are the state of the art, and that their associated reductionist and immoral commodization of people is unavoidable. As is well known to the Rasch expert group attendees, the state of the art in measurement offers a wealth of advantages inaccessible to True Score Theory. One of these is the insufficiently elaborated opportunities available for nonreductionist and moral commoditization of the constructs measured, not people themselves.

It seems plain that many of today’s problems of human suffering, social discontent, and environmental degradation could possibly be more effectively addressed by means of systematic and deliberate efforts aimed at using improved measurement methods to lower transaction costs, establish property rights, and create efficient markets supporting advanced innovations for improving the quality and quantity of intangible assets. Efforts in this direction have connected Rasch psychometrics with metrology (Mari & Wilson, 2014; Pendrill & Fisher, 2015; Fisher & Stenner, 2016), with the historical interweaving of science and the economy (Fisher, 2002, 2007, 2009, 2010, 2012, etc), and are being applied to the development of a new class of social impact bonds (see https://www.aldcpartnership.com/#/cases/financing-the-future).

What feedback, questions, and comments might the expert group attendees have in response to these efforts?

Additional references available on request.

Benham, A., & Benham, L. (2000). Measuring the costs of exchange. In C. Ménard (Ed.), Institutions, contracts and organizations: Perspectives from new institutional economics (pp. 367-375). Cheltenham, UK: Edward Elgar.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-734.

North, D. C. (1981). Structure and change in economic history. New York: W. W. Norton & Co.

On the practical value of rigorous modeling

June 1, 2018

What is the practical value of modeling real things in the world in terms requiring separable parameters?

If the parameters are separable, the stage is set for the validation of a model that is relevant to and applies to any individual person or challenge belonging to the infinite populations of those classes of all possible people and challenges.

Parameter separation does not automatically translate into representations of that conjoint relationship, though. Meaningful and consistent variation in responses requires items written to provoke answers that cohere across respondents and across the questions asked.

In addition, enough questions have to be asked to drive down uncertainty relative to the variation. Response patterns can be reproduced meaningfully only when there is more variation than uncertainty. Precision and reliability are functions of that ratio.

But when reliable and meaningful parameter separation is obtained, the laboratory model represents something real in the world. Moreover, the model stands for the object of interest in a way that persists no matter which particular people or challenges are involved.

This is where the practical value kicks in. This is what makes it possible to export the laboratory model into the real world. The specific microcosm of relationships isolated and found reproducible in research is useful and meaningful to the extent that those relationships take at least roughly the same form in the world. Instead of managing the specific indicators that are counted up in the concrete observations, it becomes possible to manage the generic object of interest, abstractly conceived. Adaptively selecting the relevant indicators according to their practical relevance on the ground in real world applications has the practical consequence of unifying measurement, management, and accountability in a single sphere of action.

Plainly there’s a lot more that needs to be said about this.

My responses to post-IOMW survey questions

May 7, 2018

My definition of objective measurement:

Reproducible invariant intervals embodied in instruments calibrated to shared unit standards explained by substantively meaningful theory. The word ‘objective’ is both redundant, like saying ‘wet rain,’ and unnecessarily exclusive of the shared subjectivity embodied in measuring instruments along with objectivity.

Distinguishing features of IOMW:

Clear focus on technical issues of measurement specifically defined in terms of models in form of natural laws, interval units with known uncertainties, data quality assessments, explanatory theory, substantive interpretation, and metrological traceability of instruments distributed to end users throughout a sociocognitive ecosystem.

Future keynote suggestions:

Luca Mari on measurement philosophy

Leslie Pendrill on metrology

Robert Massof on LOVRNet

Stefan Cano on health metrology consensus standards

Jan Morrison on STEM Learning Ecosystems

Angelica Lips Da Cruz on impact investing

Alan Schwartz on how measurement is revolutionizing philanthropy

Future training session topic suggestions:

Traceability control systems

Electronic distributed ledger systems for tracking learning, health, etc over time and across ecosystem niches

How to create information infrastructures capable of coherently integrating discontinuous levels of complexity, CSCW

How to access and put the wealth of available strictly longitudinal repeated measures of student learning growth to work (see Williamson’s 2016 Berkeley IMEKO paper)

How to integrate universally uniform measures of learning, health, etc in economic models, accounting spreadsheets, TQM/CQI quality improvement methods, outcome product pricing models, and investment finance.

How to approach measurement in terms of complex adaptive self organizing stochastic systems

Other comments:

I want to see a clear justification for any references to IRT. The vast majority of references to IRT at the NY meeting were actually references to measurement theory. If IRT is what is said, IRT ought to be what is meant. None of the major measurement theorists include IRT and they specifically disavow it as offering unidentifiable models, model choice based in p-values instead of principles and meaning, difficult if not impossible estimation problems, no proofs of conjoint additivity or of scores as sufficient statistics, and inconsistent assertions of both crossing ICCs and unidimensionality. IRT is not Measurement Theory. Why is it so widely featured at a measurement conference?

On social impact bonds and critical reflections

May 5, 2018

A new article (Roy, McHugh, & Sinclair, 2018) out this week in the Stanford Social Innovation Review echoes Gleeson-White (2015) in pointing out a disconnect between financial bottom lines and the social missions of companies whose primary objectives concern broader social and environmental impacts. The article also notes the expense of measurement, increased administrative burdens, high transaction costs, technical issues in achieving fair measures, the trend toward the negative implications of managing what is measured instead of advancing the mission, and the potential impacts of external policy environments and political climates.

The authors contend that social impact bonds are popular and proliferating for ideological reasons, not because of any evidence concerning their effectiveness in making the realization of social objectives profitable. Some of the several comments posted online in response to the article take issue with that claim, and point toward evidence of effectiveness. But the general point still stands: more must be done to systematically align investors’ financial interests with the citizens’ interest in advancing their financial, social, and environmental quality of life, and not just with the social service providers’ interest in funding and advancing their mission.

Roy et al. are correct to say that to do otherwise is to turn the people served into commodities. This happens because governance of, accountability for, and reporting of social impacts are shifted away from elected officials to the needs of private funders, with far less in the way of satisfactory recourse for citizens when programs go awry. The problem lies in the failure to create any capacity for individuals themselves to represent, invest in, manage, and profit from their skills, health, trust, and environmental service outcomes. Putting all the relevant information into the hands of service providers and investors, and making that information as low quality as it is, can only ever result in one-sided effects on people themselves. With no idea of the technologies, models, decades of results, and ready examples to draw from in the published research, the authors conclude with a recommendation to leave well enough alone and to pursue more traditional avenues of policy formation, instead of allowing the “cultural supremacy of market principles” to continue advancing into every area of life.

But as is so commonly the case when it comes to technical issues of quantification, the authors’ conclusions and criticisms skip over the essential role that high quality measurement plays in reducing transaction costs and supporting property rights. In general, measurement standards inform easily communicated and transferable information about the quantity and quality of products in markets, thereby lowering transaction costs and enabling rights to the ownership of specific amounts of things. The question that goes unasked in this article, and in virtually every other article in the area of ESG, social impact investing, etc., is this: What kind of measurement technologies and systems would we need to be able to replicate existing market efficiencies in new markets for human, social, and natural capital?

That question and other related ones are, of course, the theme of this blog and of many of my publications. Further exploration here and in the references to other posts (such as Fisher, 2011, 2012a, 2012b) may prove fruitful to others seriously interested in finding a way out of the unexamined assumptions stifling creativity in this area.

In short, instead of turning people into commodities, why should we not turn skills, health, trust, and environmental services into commodities? Why should not every person have legal title to scientifically and uniformly measured numbers of shares of each essential form of human, social, and natural capital? Why should individuals not be able to profit in both monetary and personal terms from their investments in education, health care, community, and the environment? Why should we allow corporations to continue externalizing the costs of social and environmental investments, at the expense of individual citizens and communities? Why is there so much disparity and inequality in the opportunities for skill development and healthy lives available across social sectors?

Might not our inability to obtain good information about processes and outcomes in the domains of educational, health care, social service, and environmental management have a lot to do with it? Why don’t we have the information infrastructure we need, when the technology for creating it has been in development for over 90 years? Why are there so many academics, researchers, philanthropic organizations, and government agencies that are content with the status quo when these longstanding technologies are available, and people, communities, and the environment are suffering from the lack of the information they ought to have?

During the French revolution, one of the primary motivations for devising the metric system was to extend the concept of universal rights to individual commercial exchanges. The confusing proliferation of metrics in Europe at the time made it possible for merchants and the nobility to sell in one unit and buy with another. Universal rights plainly implied universal measures. Alder (2002, p. 2) explains that:

“To do their job, standards must operate as a set of shared assumptions, the unexamined background against which we strike agreements and make distinctions. So it is not surprising that we take measurement for granted and consider it banal. Yet the use a society makes of its measures expresses its sense of fair dealing. That is why the balance scale is a widespread symbol of justice. .. Our methods of measurement define who we are and what we value.”

Getting back to the article by Roy, McHugh, and Sinclair, yes, it is true that the measures in use in today’s social impact bonds are woefully inadequate. Far from living up to the kind of justice symbolized by the balance scale, today’s social impact measures define who we are in terms of units of measurement that differ and change in unknown ways across individuals, over time, and across instruments. This is the reason for many, if not all, of the problems Roy et al. find with social impact bonds: their measures are not up to the task.

But instead of taking that as an unchangeable given, should not we do more to ask what kinds of measures could do the job that needs to be done? Should not we look around and see if in fact there might be available technologies able to advance the cause?

Theory and evidence have, in fact, been brought to bear in formulating approaches to instrument calibration that reproduce the balance scale’s fair and just comparisons of weight from data like that from tests and surveys (Choi, 1998; Massof, 2011; Rasch, 1960, pp. 110-115). The same thing has been done in reproducing measures of length (Stephanou & Fisher, 2013), distance (Moulton, 1993), and density (Pelton & Bunderson, 2003).

These are not isolated and special results. The methods involved have been in use for decades and in dozens of fields (Wright, 1968, 1977, 1999; Wright & Masters, 1982; Wright & Stone, 1979, 1999; Andrich, 1978, 1988, 1989, 2010; Bond & Fox, 2015; Engelhard, 2012; Wilson, 2005; Wilson & Fisher, 2017). Metric system engineers and physicists are in accord with psychometricians as to the validity of these claims (Pendrill & Fisher, 2015) and are on the record with positive statements of support:

“Rasch models belong to the same class that metrologists consider paradigmatic of measurement” (Mari and Wilson, 2014, p. 326).

“The Rasch approach…is not simply a mathematical or statistical approach, but instead [is] a specifically metrological approach to human-based measurement” (Pendrill, 2014, p. 26).

These statements represent the attitude toward measurement possibilities being applied by at least one effort in the area of social impact investing (https://www.aldcpartnership.com/#/cases/financing-the-future). Hopefully, there will be many more projects of this kind emerging in the near future.

The challenges are huge, of course. This is especially the case when considering the discontinuous levels of complexity that have to be negotiated in making information flow across locally situated individual niches, group-level organizations and communities, and global accountability applications (Fisher, 2017; Fisher, Oon, & Benson, 2018; Fisher & Stenner, 2018). But taking on these challenges makes far more sense than remaining complicitly settled in a comfortable rut, throwing up our hands at how unfair life is.

There’s a basic question that needs to be asked. If what is presented as measurement raises transaction costs and does not support ownership rights to what is measured, is it really measurement? How can the measurement of kilowatts, liters, and grams lower transaction costs and support property rights at the same time that other so-called measurements raise transaction costs and do not support property rights? Does not this inconsistency suggest something might be amiss in the way measurement is conceived in some areas?

For more info, check out these other posts here:

https://livingcapitalmetrics.wordpress.com/2015/05/01/living-capital-metrics-for-financial-and-sustainability-accounting-standards/

https://livingcapitalmetrics.wordpress.com/2014/11/08/another-take-on-the-emerging-paradigm-shift/

https://wordpress.com/post/livingcapitalmetrics.wordpress.com/1812

https://wordpress.com/post/livingcapitalmetrics.wordpress.com/497

References

Alder, K. (2002). The measure of all things: The seven-year odyssey and hidden error that transformed the world. New York: The Free Press.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.

Andrich, D. (1988). Sage University Paper Series on Quantitative Applications in the Social Sciences. Vol. series no. 07-068: Rasch models for measurement. Beverly Hills, California: Sage Publications.

Andrich, D. (1989). Constructing fundamental measurements in social psychology. In J. A. Keats, R. Taft, R. A. Heath & S. H. Lovibond (Eds.), Mathematical and theoretical systems. Proceedings of the 24th International Congress of Psychology of the International Union of Psychological Science, Vol. 4 (pp. pp. 17-26). Amsterdam, Netherlands: North-Holland.

Andrich, D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika, 75(2), 292-308.

Bond, T., & Fox, C. (2015). Applying the Rasch model: Fundamental measurement in the human sciences, 3d edition. New York: Routledge.

Choi, E. (1998). Rasch invents “ounces.” Popular Measurement, 1(1), 29. Retrieved from https://www.rasch.org/pm/pm1-29.pdf

Engelhard, G., Jr. (2012). Invariant measurement: Using Rasch models in the social, behavioral, and health sciences. New York: Routledge Academic.

Fisher, W. P., Jr. (2011). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 12(1), 49-66.

Fisher, W. P., Jr. (2012a). Measure and manage: Intangible assets metric standards for sustainability. In J. Marques, S. Dhiman & S. Holt (Eds.), Business administration education: Changes in management and leadership strategies (pp. 43-63). New York: Palgrave Macmillan.

Fisher, W. P., Jr. (2012b, May/June). What the world needs now: A bold plan for new standards [Third place, 2011 NIST/SES World Standards Day paper competition]. Standards Engineering, 64(3), 1 & 3-5 [http://ssrn.com/abstract=2083975].

Fisher, W. P., Jr. (2017). A practical approach to modeling complex adaptive flows in psychology and social science. Procedia Computer Science, 114, 165-174. Retrieved from https://doi.org/10.1016/j.procs.2017.09.027

Fisher, W. P., Jr., Oon, E. P.-T., & Benson, S. (2018). Applying Design Thinking to systemic problems in educational assessment information management. Journal of Physics Conference Series, pp. in press; [http://media.imeko-tc7-rio.org.br/media/uploads/s/wfisher@berkeley.edu_1497049869_781396.pdf].

Fisher, W. P., Jr., & Stenner, A. J. (2018). Ecologizing vs modernizing in measurement and metrology. Journal of Physics Conference Series, pp. in press [http://media.imeko-tc7-rio.org.br/media/uploads/s/wfisher@berkeley.edu_1496875919_204672.pdf].

Gleeson-White, J. (2015). Six capitals, or can accountants save the planet? Rethinking capitalism for the 21st century. New York: Norton.

Mari, L., & Wilson, M. (2014, May). An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315-327.

Massof, R. W. (2011). Understanding Rasch and Item Response Theory models: Applications to the estimation and validation of interval latent trait measures from responses to rating scale questionnaires. Ophthalmic Epidemiology, 18(1), 1-19.

Moulton, M. (1993). Probabilistic mapping. Rasch Measurement Transactions, 7(1), 268 [http://www.rasch.org/rmt/rmt71b.htm].

Pelton, T., & Bunderson, V. (2003). The recovery of the density scale using a stochastic quasi-realization of additive conjoint measurement. Journal of Applied Measurement, 4(3), 269-281.

Pendrill, L. (2014, December). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33.

Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roy, M. J., McHugh, N., & Sinclair, S. (2018, 1 May). A critical reflection on social impact bonds. Stanford Social Innovarion Review. Retrieved 5 May 2018, from https://ssir.org/articles/entry/a_critical_reflection_on_social_impact_bonds?utm_source=Enews&utm_medium=Email&utm_campaign=SSIR_Now&utm_content=Title.

Stephanou, A., & Fisher, W. P., Jr. (2013). From concrete to abstract in the measurement of length. Journal of Physics Conference Series, 459, http://iopscience.iop.org/1742-6596/459/1/012026.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wilson, M., & Fisher, W. (2017). Psychological and social measurement: The career and contributions of Benjamin D. Wright. New York: Springer.

Wright, B. D. (1968). Sample-free test calibration and person measurement. In Proceedings of the 1967 invitational conference on testing problems (pp. 85-101 [http://www.rasch.org/memo1.htm]). Princeton, New Jersey: Educational Testing Service.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1979). Best test design: Rasch measurement. Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/measess/me-all.pdf].

Measurement quality checklist

April 21, 2018

We all make and use dozens of measurements everyday, reading everything from clocks to speedometers to rulers to weight scales to thermometers. We also deal with numbers often called measures but which are not expressed in a meaningful unit of comparison, like test scores, ratings, survey percentages, counts of how many times something happened, etc. These numbers don’t stand for a constant amount that adds up in the way hours, distances, weight, and temperature do. These lower quality ordinal numbers from tests and ratings are in wide use and are commonly called measurements but are also generally understood as not obtaining the kind of rigor and precision associated with physical measurements.

The failure to produce meaningful measurements of abilities, attitudes, knowledge, and behaviors, however, does not mean the things we measure with tests, surveys, and rating scales cannot be scientifically quantified. On the contrary, meaningful, useful, reliable, and validated high quality measurement has been available and in use on mass scales for decades. Perhaps the public remains unaware of these developments because of the technical mathematics and laborious analytic details involved. There are also a great many unexamined cultural presuppositions that assume human attributes cannot be measured meaningfully, and should not be measured even if they can be.

The problem here, as is so often the case when uninformed opinions hold sway, is that the predominance of meaningless numbers masquerading as measures actually makes us much more worse off than we would be if we invested the time and resources needed to create the Intangible Assets Metric System I’ve referred to elsewhere in this blog.

Many of the other posts here contrast meaningless and meaningful approaches to measurement, so I won’t repeat any of that here. What I’ll do instead is provide something that’s been suggested by many friends and colleagues over the years: a simple checklist of basic features that ought to be readily available in any measurement system worthy of the name. To find out more about any of these features, search the terms as they are listed.

An interval unit

Individual measures in that unit

Individual item locations in that unit

Rating scale transition thresholds in that unit

An uncertainty or error term for each individual measure

Data quality, internal consistency, and model fit statistics for each individual measure

Experimental evidence supporting the claim to an interval unit

A mathematical model of the interval unit

References to mathematical proofs that (a) the observed data are necessary and sufficient to the estimation of the model parameters, and (b) the model satisfies the requirements of conjoint additivity

Cronbach’s alpha, a KR-20, a separation index, or a separation reliability coefficient expressing the ratio of explained variance to uncertainty/error variance, for both persons and items

A map of the construct measured illustrating how the items are supposed to work together

Interpretive guidelines showing what measures mean as functions of item scale locations

A Wright map illustrating the conjoint relation of the measure and item distributions

A kidmap or other map of individual ordered responses useful for informing instruction or treatment

A theory explaining variation in item scale locations

Evidence of traceability to a unit standard, if available

Evidence that items are not biased for or against any identifiable groups

Information on the calibration sample and results (responses per item, etc)

For a commercially successful scientific reading ability measurement framework, see http://www.lexile.com and Fisher & Stenner, 2016. For articles co-authored by metrology engineers and psychometricians, see Mari & Wilson, 2014, and Pendrill & Fisher, 2015. To see thousands of articles and books on related measurement topics dating back 50 years, search Rasch Measurement in Google Scholar. For consulting and advice on measurement, fill in a comment below, see http://www.livingcapitalmetrics.com, or explore the wealth of resources at http://www.rasch.org. For overviews of the role of measurement quality in education and health care, see Cano and Hobart (2011), Hobart, et al. (2007), Massof (2008), Massof and Rubin (2001), Wilson (2013), and Wright (1984, 1999).

References

Cano, S. J., & Hobart, J. C. (2011). The problem with health measurement. Patient Preference and Adherence, 5, 279-290.

Fisher, William P. Jr., and Stenner, A. Jackson. 2016. Theory-based metrological traceability in education: A reading measurement network, Measurement,
92, 489-496.

Hobart, J. C., Cano, S. J., Zajicek, J. P., & Thompson, A. J. (2007, December). Rating scales as outcome measures for clinical trials in neurology: Problems, solutions, and recommendations. Lancet Neurology, 6, 1094-1105.

Mari, Luca, and Wilson, Mark. 2014. An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315–327.

Massof, R. W. (2008, July-August). Editorial: Moving toward scientific measurements of quality of life. Ophthalmic Epidemiology, 15, 209-211.

Massof, R. W., & Rubin, G. S. (2001, May-Jun). Visual function assessment questionnaires. Survey of Ophthalmology, 45(6), 531-48.

Pendrill, Leslie, and Fisher, William P. Jr. 2015.  Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.

Wilson, M. R. (2013, April). Seeking a balance between the statistical and scientific elements in psychometrics. Psychometrika, 78(2), 211-236.

Wright, B. D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3(1), 281-288 [http://www.rasch.org/memo41.htm].

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Reductionist vs Nonreductionist Conceptualizations of Psychological and Social Measurement

April 19, 2018

Reductionist

  • Root metaphor: Mechanical clockwork universe
  • Paradigmatic case: Newtonian physics
  • Complete, consistent, deterministic structures
  • Whole is sum of parts
  • Sufficient statistics: Mean & std deviation
  • Uncertainty is variation across repeated measures
  • Test/survey items in use define totality of universe of possibility; changes in items change that universe
  • Descriptive, reactionary
  • Microlevel facts are supposed to additively combine into general laws
  • General laws are discovered by measuring
  • Top down data analytics influence policy
  • Externally imposed assembly processes
  • Subject/object dualism institutionalized in data analytics process
  • Data are hallmark criterion of objectivity
  • Subjectivity discounted, removed if possible
  • Counts are quantities
  • Ordinal scores treated as interval measures with no justification
  • Score variation relates solely to person characteristics
  • Score meaning tied to particular questions asked
  • Quantitative methods don’t define unit quantities or test for them
  • Qualitative data and methods are separated from quantitative data/methods
  • No model of construct stated or tested
  • Group level multivariate focus
  • P-values are primary model fit criterion
  • Population sampling motivates probabilistic approach
  • Equating based on statistical assumptions concerning score distribution

Nonreductionist

  • Root metaphor: Living organic universe
  • Paradigmatic case: Multilevel ecosystems
  • Incomplete, not perfectly consistent, stochastic structures
  • Whole is greater than sum of parts
  • Sufficient statistics: scores
  • Uncertainty is resonance of stochastic invariance within individual measures
  • Test/survey items in use sample from infinite population; changes in items used do not change that universe
  • Prescriptive, anticipatory
  • Microlevel facts self-organize into meso abstractions & macro formalisms
  • Measuring presumes general laws
  • Bottom up alignments and coordinations of decisions and behaviors move society
  • Internal processes of self-organization
  • Mutually implied subject-object entangled together in playful flow institutionalized via distributed instrumentation
  • Objectivity requires data explained by theory embodied in instruments
  • Subjectivity included as valid source of concerns and insights scrutinized for value
  • Counts might lead to quantity definition
  • Interval measures theoretically and empirically substantiated
  • Empirical & theoretical measure variation maps construct via items and persons
  • Measure meaning is independent of particular questions asked
  • Quantitative methods define unit quantities and test for them
  • Qualitative methods are integrated with quantitative methods
  • Mathematical, observation, and construct models stated and tested
  • Individual level univariate focus
  • Meaningful construct definition primary model fit criterion
  • Individual response process motivates probabilistic approach
  • Equating requires alignment of items along common dimension