Posts Tagged ‘measurement’

Psychology and the social sciences: An atheoretical, scattered, and disconnected body of research

February 16, 2019

A new article in Nature Human Behaviour (NHB) points toward the need for better theory and more rigorous mathematical models in psychology and the social sciences (Muthukrishna & Henrich, 2019). The authors rightly say that the lack of an overarching cumulative theoretical framework makes it very difficult to see whether new results fit well with previous work, or if something surprising has come to light. Mathematical models are especially emphasized as being of value in specifying clear and precise expectations.

The point that the social sciences and psychology need better theories and models is painfully obvious. But there are in fact thousands of published studies and practical real world applications that not only provide, but indeed often surpass, the kinds of predictive theories and mathematical models called for in the NHB article. The article not only makes no mention of any of this work, its argument is framed entirely in a statistical context instead of the more appropriate context of measurement science.

The concept of reliability provides an excellent point of entry. Most behavioral scientists think of reliability statistically, as a coefficient with a positive numeric value usually between 0.00 and 1.00. The tangible sense of reliability as indicating exactly how predictable an outcome is does not usually figure in most researchers’ thinking. But that sense of the specific predictability of results has been the focus of attention in social and psychological measurement science for decades.

For instance, the measurement of time is reliable in the sense that the position of the sun relative to the earth can be precisely predicted from geographic location, the time of day, and the day of the year. The numbers and words assigned to noon time are closely associated with the Sun being at the high point in the sky (though there are political variations by season and location across time zones).

That kind of a reproducible association is rarely sought in psychology and the social sciences, but it is far from nonexistent. One can discern different degrees to which that kind of association is included in models of measured constructs. Though most behavioral research doesn’t mention the connection between linear amounts of a measured phenomenon and a reproducible numeric representation of it (level 0), quite a significant body of work focuses on that connection (level 1). The disappointing thing about that level 1 work is that the relentless obsession with statistical methods prevents most researchers from connecting a reproducible quantity with a single expression of it in a standard unit, and with an associated uncertainty term (level 2). That is, level 1 researchers conceive measurement in statistical terms, as a product of data analysis. Even when results across data sets are highly correlated and could be equated to a common metric, level 1 researchers do not leverage that source of potential value for simplified communication and accumulated comparability.

And then, for their part, level 2 researchers usually do not articulate theories about the measured constructs, by augmenting the mathematical data model with an explanatory model predicting variation (level 3). Level 2 researchers are empirically grounded in data, and can expand their network of measures only by gathering more data and analyzing it in ways that bring it into their standard unit’s frame of reference.

Level 3 researchers, however, have come to see what makes their measures tick. They understand the mechanisms that make their questions vary. They can write new questions to their theoretical specifications, test those questions by asking them of a relevant sample, and produce the predicted calibrations. For instance, reading comprehension is well established to be a function of the difference between a person’s reading ability and the complexity of the text they encounter (see articles by Stenner in the list below). We have built our entire educational system around this idea, as we deliberately introduce children first to the alphabet, then to the most common words, then to short sentences, and then to ever longer and more complicated text. But stating the construct model, testing it against data, calibrating a unit to which all tests and measures can be traced, and connecting together all the books, articles, tests, curricula, and students is a process that began (in English and Spanish) only in the 1980s. The process still is far from finished, and most reading research still does not use the common metric.

In this kind of theory-informed context, new items can be automatically generated on the fly at the point of measurement. Those items and inferences made from them are validated by the consistency of the responses and the associated expression of the expected probability of success, agreement, etc. The expense of constant data gathering and analysis can be cut to a very small fraction of what it is at levels 0-2.

Level 3 research methods are not widely known or used, but they are not new. They are gaining traction as their use by national metrology institutes globally grows. As high profile critiques of social and psychological research practices continue to emerge, perhaps more attention will be paid to this important body of work. A few key references are provided below, and virtually every post in this blog pertains to these issues.

References

Baghaei, P. (2008). The Rasch model as a construct validation tool. Rasch Measurement Transactions, 22(1), 1145-6 [http://www.rasch.org/rmt/rmt221a.htm].

Bergstrom, B. A., & Lunz, M. E. (1994). The equivalence of Rasch item calibrations and ability estimates across modes of administration. In M. Wilson (Ed.), Objective measurement: Theory into practice, Vol. 2 (pp. 122-128). Norwood, New Jersey: Ablex.

Cano, S., Pendrill, L., Barbic, S., & Fisher, W. P., Jr. (2018). Patient-centred outcome metrology for healthcare decision-making. Journal of Physics: Conference Series, 1044, 012057.

Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct validation. Measurement & Evaluation in Counseling & Development, 43(2), 121-149.

Embretson, S. E. (2010). Measuring psychological constructs: Advances in model-based approaches. Washington, DC: American Psychological Association.

Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.

Fischer, G. H. (1983). Logistic latent trait models with linear constraints. Psychometrika, 48(1), 3-26.

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238 [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2008). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-1163 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., & Stenner, A. J. (2016). Theory-based metrological traceability in education: A reading measurement network. Measurement, 92, 489-496.

Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37(4), 827-833.

Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139-64.

Hobart, J. C., Cano, S. J., Zajicek, J. P., & Thompson, A. J. (2007). Rating scales as outcome measures for clinical trials in neurology: Problems, solutions, and recommendations. Lancet Neurology, 6, 1094-1105.

Irvine, S. H., Dunn, P. L., & Anderson, J. D. (1990). Towards a theory of algorithm-determined cognitive test construction. British Journal of Psychology, 81, 173-195.

Kline, T. L., Schmidt, K. M., & Bowles, R. P. (2006). Using LinLog and FACETS to model item components in the LLTM. Journal of Applied Measurement, 7(1), 74-91.

Lunz, M. E., & Linacre, J. M. (2010). Reliability of performance examinations: Revisited. In M. Garner, G. Engelhard, Jr., W. P. Fisher, Jr. & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 1 (pp. 328-341). Maple Grove, MN: JAM Press.

Mari, L., & Wilson, M. (2014). An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315-327.

Markward, N. J., & Fisher, W. P., Jr. (2004). Calibrating the genome. Journal of Applied Measurement, 5(2), 129-141.

Maul, A., Mari, L., Torres Irribarra, D., & Wilson, M. (2018). The quality of measurement results in terms of the structural features of the measurement process. Measurement, 116, 611-620.

Muthukrishna, M., & Henrich, J. (2019). A problem in theory. Nature Human Behaviour, 1-9.

Obiekwe, J. C. (1999, August 1). Application and validation of the linear logistic test model for item difficulty prediction in the context of mathematics problems. Dissertation Abstracts International: Section B: The Sciences & Engineering, 60(2-B), 0851.

Pendrill, L. (2014). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33.

Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.

Pendrill, L., & Petersson, N. (2016). Metrology of human-based and other qualitative measurements. Measurement Science and Technology, 27(9), 094003.

Sijtsma, K. (2009). Correcting fallacies in validity, reliability, and classification. International Journal of Testing, 8(3), 167-194.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107-120.

Stenner, A. J. (2001). The necessity of construct theory. Rasch Measurement Transactions, 15(1), 804-5 [http://www.rasch.org/rmt/rmt151q.htm].

Stenner, A. J., Fisher, W. P., Jr., Stone, M. H., & Burdick, D. S. (2013). Causal Rasch models. Frontiers in Psychology: Quantitative Psychology and Measurement, 4(536), 1-14.

Stenner, A. J., & Horabin, I. (1992). Three stages of construct definition. Rasch Measurement Transactions, 6(3), 229 [http://www.rasch.org/rmt/rmt63b.htm].

Stenner, A. J., Stone, M. H., & Fisher, W. P., Jr. (2018). The unreasonable effectiveness of theory based instrument calibration in the natural sciences: What can the social sciences learn? Journal of Physics Conference Series, 1044(012070).

Stone, M. H. (2003). Substantive scale construction. Journal of Applied Measurement, 4(3), 282-297.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wilson, M. R. (2013). Using the concept of a measurement system to characterize measurement models used in psychometrics. Measurement, 46, 3766-3774.

Wright, B. D., & Stone, M. H. (1979). Chapter 5: Constructing a variable. In Best test design: Rasch measurement (pp. 83-128). Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/measess/me-all.pdf].

Wright, B. D., Stone, M., & Enos, M. (2000). The evolution of meaning in practice. Rasch Measurement Transactions, 14(1), 736 [http://www.rasch.org/rmt/rmt141g.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

Making sustainability impacts universally identifiable, individually owned, efficiently exchanged, and profitable

February 2, 2019

Sustainability impacts plainly and obviously lack common product definitions, objective measures, efficient markets, and associated capacities for competing on improved quality. The absence of these landmarks in the domain of sustainability interests is a result of inattention and cultural biases far more than it is a result of the inherent characteristics or nature of sustainability itself. Given the economic importance of these kinds of capacities and the urgent need for new innovations supporting sustainable development, it is curious how even those most stridently advocating new ways of thinking seem to systematically ignore well-established opportunities for advancing their cause. The wealth of historical examples of rapidly emerging, transformative, disruptive, and highly profitable innovations would seem to motivate massive interest in how extend those successes in new directions.

Economists have long noted how common currencies reduce transaction costs, support property rights, and promote market efficiencies (for references and more information, see previous entries in this blog over the last ten years and more). Language itself is well known for functioning as an economical labor-saving device in the way that useful concepts representing things in the world as words need not be re-invented by everyone for themselves, but can simply be copied. In the same ways that common languages ease communication, and common currencies facilitate trade, so, too, do standards for common product definitions contribute to the creation of markets.

Metrologically traceable measurements make it possible for everyone everywhere to know how much of something in particular there is. This is important, first of all, because things have to be identifiable in shared ways if we are to be able to include them in our lives, socially. Anyone interested in obtaining or producing that kind of thing has to be able to know it and share information about it as something in particular. Common languages capable of communicating specifically what a thing is, and how much of it there is, support claims to ownership and to the fruits of investments in entrepreneurial innovations.

Technologies for precision measurement key to these communications are one of the primary products of science. Instruments measuring in SI units embody common currencies for the exchange of scientific capital. The calibration and distribution of such instruments in the domain of sustainability impact investing and innovation ought to be a top-level priority. How else will sustainable impacts be made universally identifiable, individually owned, efficiently exchanged, and profitable?

The electronics, computer, and telecommunications industries provide ample evidence of precision measurement’s role in reducing transaction costs, establishing common product definitions, and reaping huge profits. The music industry’s use of these technologies combines the science and economics of precision measurement with the artistic creativity of intensive improvisations constructed from instruments tuned to standardized scales that achieve wholly unique levels of individual innovation.

Much stands to be learned, and even more to be gained, in focusing sustainability development on ways in which we can harness the economic power of the profit motive by combining collective efforts with individual imaginations in the domains of human, social, and natural capital. Aligning financial, monetary wealth with the authentic wealth and genuine productivity of gains in human, community, and environmental value ought to be the defining mission of this generation. The time to act is now.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Why economic growth can and inevitably will be green

October 1, 2018

So, approaching matters once again from yet another point of view, we have Jason Hickel explaining a couple of weeks ago “Why Growth Can’t Be Green.” This article provides yet another example of how the problem is the problem. That is, the way we define problems sets up particular kinds of solutions in advance, and sometimes, as Einstein famously pointed out, problems cannot be solved from within the same conceptual framework that gave rise to them. I’ve expanded on this theme in a number of previous posts, for instance, here.

Hickel takes up the apparent impossibility of aligning economic growth with environmental values. He speaks directly to what he calls the rebound effect, the way that “improvements in resource efficiency drive down prices and cause demand to rise—thus canceling out some of the gains.” But that rebound can happen only as long as the economy remains defined and limited by the alignment of manufactured capital and finance, ignoring the largely unexamined and unconsidered possibility that human, social, and natural capital could be measured well enough to be also aligned with finance.

Hence, as I say, the problem is the problem. Broadening one’s conceptualization of the problem opens up new opportunities that otherwise never come into view.

The Hickel article’s entire focus is then on top-down policy impositions like taxes or a Genuine Progress Index. These presume human, social, and natural capital can only ever exist in dead formations that have to be micromanaged and concretely manipulated, and that efficient markets bringing them to life are inherently and literally unthinkable. (See a short article here for an explanation of the difference between dead and living capital. There’s a lot more where that came from, as is apparent in the previous posts here in this blog.)

The situation could be vastly different than what Hickel imagines. If we could own, buy, and sell products in efficient markets we could reward the production of human, social, and environmental value. In that scenario, when improvements in environmental resource efficiency are obtained, demand for that new environmental value will rise and its price will go down, not the resource’s price.

We ought to be creative enough to figure out how to configure markets so that prices for environmental resources (oil, farmland, metals, etc.) can stay constant or fall without increasing demand for them, as could happen if that demand is counterbalanced and absorbed by rising human, social, and environmental quality capital values.

The question is how to absorb the rebound effect in other forms of capital that grow in demand while holding demand for the natural resource base in check. The vital conceptual distinction is between socialistic centralized planning and control of actual physical entities (people, communities, the environment, and manufactured items), on the one hand, and capitalistic decentralized distributed network effects on abstract transferable representations, on the other. Everyone defaults to the socialist scenario without ever considering there might be a whole other arena in which fruitful possibilities might be imagined.

What if, for instance, we could harness the profit motive to promote growth in genuine human, social, and environmental value? What if we were able to achieve qualitatively meaningful increases in authentic wealth that were economically contingent on reduced natural resource consumption? What if the financial and substantive value profits that could be had meant that resource consumption could be reduced by the same kinds of factors as have been realized in the context of Moore’s Law? What if a human economics of genuine value could actually result in humanity being able to adjust the global thermostat up or down in small increments by efficiently rewarding just the right combinations of policies and practices at the right times and places in the right volumes?

The only way that could ever happen is if people are motivated to do the right thing for the earth and for humanity because it is the right thing for them and their families. They have to be able to own their personal shares of their personal stocks of human, social, and natural capital. They have to be able to profit from investments in their own and others’ shares. They will not act on behalf of the earth and humanity only because it is the right thing to do. There has to be evidence and explanations of how everyone is fairly held accountable to the same standards, and has the same opportunities for profit and loss as anyone else. Then, and only then, it seems, will human, social, and environmental value become communicable in a viral contagion of good will.

Socialism has been conclusively proven unworkable, for people, communities, and the environment, as well as financially. But a human, social, and natural capitalism has hardly even been articulated, much less tried out. How do we make human, social, and natural capital fungible? How might the economy transcend its traditional boundaries and expand itself beyond the existing alignment of manufactured capital and finance?

It’s an incredibly complex proposal, but also seems like such a simple thing. The manufactured capital economy uses the common language of good measurement to improve quality, to simplify management communications, and to lower transaction costs in efficient markets. So what should we do if we want to correct the imbalanced negative impacts on people, communities, and the environment created by the misplaced emphasis on aligning only manufactured capital and financial capital?

As has been repeatedly proposed for years in this blog, maybe we should use the manufactured capital markets as a model and use good measurement to improve the quality of human, social, and environmental capital, to simplify communications and management, to lower transaction costs, and to align the genuine human, social, and environmental value created with financial value in efficient markets.

Of course, grasping that as viable, feasible, and desirable requires understanding that substantively meaningful precision measurement is something quite different from what usually passes for quantification. And that is an entirely different story, though one taken up repeatedly in previous entries in this blog, of course….

 

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

 

New Ideas on How to Realize the Purpose of Capital

September 20, 2018

I’d like to offer the following in reply to James Militzer, at https://nextbillion.net/deciphering-emersons-tears-time-impact-investing-lower-expectations/.

Rapid advances toward impact investing’s highest goals of social transformation are underway in quiet technical work being done in places no one is looking. That work shares Jed Emerson’s sentiments expressed at the 2017 Social Capital Markets conference, as he is quoted in Militzer’s NextBillion.net posting, that “The purpose of capital is to advance a more progressively free and just experience of life for all.” And he is correct in what Militzer reported he said the year before, that we need a “real, profound critique of current practices within financial capitalism,” one that would “require real change in our own behavior aside from adding a few funds to our portfolios here or augmenting a reporting process there.”

But the efforts he and others are making toward fulfilling that purpose and articulating that critique are incomplete, insufficient, and inadequate. Why? How? Language is the crux of the matter, and the issues involved are complex and technical. The challenge, which may initially seem simplistic or naive, is how to bring human, social, and environmental values into words. Not just any words, but meaningful words in a common language. What is most challenging is that this language, like any everyday language, has to span the range from abstract theoretical ideals to concrete local improvisations.

That means it cannot be like our current languages for expressing human, social, and environmental value. If we are going to succeed in aligning those forms of value with financial value, we have a lot of work to do.

Though there is endless talk of metrics for managing sustainable impacts, and though the importance of these metrics for making sustainability manageable is also a topic of infinite discussion, almost no one takes the trouble to seek out and implement the state of the art in measurement science. This is a crucial way, perhaps the most essential way, in which we need to criticize current practices within financial capitalism and change our behaviors. Oddly, almost no one seems to have thought of that.

That is, one of the most universally unexamined assumptions of our culture is that numbers automatically stand for quantities. People who analyze numeric data are called quants, and all numeric data analysis is referred to as quantitative. That is the case, but almost none of these quants and quantitative methods involve actually defining, modeling, identifying, evaluating, or applying an substantive unit of something real in the world that can be meaningfully represented by numbers.

There is, of course, an extensive and longstanding literature on exactly this science of measurement. It has been a topic of research, philosophy, and practical applications for at least 90 years, going back to the work of Thurstone at the University of Chicago in the 1920s. That work continued at the University of Chicago with Rasch’s visit there in 1960, with Wright’s adoption and expansion of Rasch’s theory and methods, and with the further work done by Wright’s students and colleagues in the years since.

Most importantly, over the last ten years, metrologists, the physicists and engineers who maintain and improve the SI units, the metric system, have taken note of what’s been going on in research and practice involving the approaches to measurement developed by Rasch, Wright, and their students and colleagues (for just two of many articles in this area, see here and here). The most recent developments in this new metrology include

(a) initiatives at national metrology institutes globally (Sweden and the UK, Portugal, Ukraine, among others) to investigate potentials for a new class of unit standards;

(b) a special session on this topic at the International Measurement Confederation (IMEKO) World Congress in Belfast on 5 September 2018;

(c) the Journal of Physics Conference Series proceedings of the 2016 IMEKO Joint Symposium hosted by Mark Wilson and myself at UC Berkeley;

(d) the publication of a 2017 book on Ben Wright edited by Mark Wilson and myself in Springer’s Series on Measurement Science and Technology; and

(e) the forthcoming October 2018 special issue of Elsevier’s Measurement journal edited by Wilson and myself, and a second one currently in development.

There are profound differences between today’s assumptions about measurement and how a meaningful art and science of precision measurement proceeds. What passes for measurement in today’s sustainability economics and accounting are counts, percentages, and ratings. These merely numeric metrics do not stand for anything that adds up the way they do. In fact, it’s been repeatedly demonstrated over many years that these kinds of metrics measure in a unit that changes size depending on who or what is measured, who is measuring, and what tool is used to measure. What makes matters even worse is that the numbers are usually taken to be perfectly precise, as uncertainty ranges, error terms, and confidence intervals are only sporadically provided and are usually omitted.

Measurement is not primarily a matter of data analysis. Measurement requires calibrated instruments that can be read as standing for a given amount of something that stays the same, within the uncertainty range, no matter who is measuring, no matter what or who is measured, and no matter what tool is used. This is, of course, quite an accomplishment when it can be achieved, but it is not impossible and has been put to use in large scale practical ways for several decades (for instance, see here, here, and here). Universally accessible instruments calibrated to common unit standards are what make society in general, and markets in particular, efficient in the way of projecting distributed network effects, turning communities into massively parallel stochastic computers (as W. Brian Arthur put it on p. 6 of his 2014 book, Complexity Economics).

These are not unexamined assumptions or overly ideal theoretical demands. They are pragmatic ways of adapting to emergent patterns in various kinds of data that have repeatedly been showing themselves around the world for decades. Our task is to literally capitalize on these nonhuman forms of life by creating multilevel, complex ecosystems of relationships with them, letting them be what they are in ways that also let us represent ourselves to each other. (Emerson quotes Bruno Latour to this effect on page 136 in his new book, The Purpose of Capital; those familiar with my work will know I’ve been reading and citing Latour since the early 1980s).

So it seems to me that, however well-intentioned those promoting impact investing may be, there is little awareness of just how profound and sweeping the critique of current practices needs to be, or of just how much our own behaviors are going to have to change. There are, however, truly significant reasons to be optimistic and hopeful. The technical work being done in measurement and metrology points toward possibilities for extending everyday language into a pragmatic idealism that does not require caving in to either varying local circumstances or to authoritarian dictates.

The upside of the situation is that, as so often happens in the course of human history, this critique and the associated changes are likely to have that peculiar quality captured in the French expression, “plus ça change, plus c’est la même chose” (the more things change, the more they stay the same). The changes in process are transformative, but will also be recognizable repetitions of human scale patterns.

In sum, what we are doing is tuning the instruments of the human, social, and environmental sciences to better harmonize relationships. Just as jazz, folk, and world music show that creative improvisation is not constrained by–but is facilitated by–tuning standards and high tech solutions, so, too, can we make that the case in other areas.

For instance, in my presentation at the IMEKO World Congress in Belfast on 5 September, I showed that the integration of beauty and meaning we have within our grasp reiterates principles that date back to Plato. The aesthetics complement the mathematics, with variations on the same equations being traceable from the Pythagorean theorem to Newton’s laws to Rasch’s models for measurement (see, for instance, Fisher & Stenner, 2013). In many ways, the history of science and philosophy continues to be a footnote to Plato.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

 

Current events in metrology for fun, profitable, and self-sustaining sustainability impacts

September 18, 2018

At the main event I attended last week at the Global Climate Action Summit in San Francisco, the #giveyouthachance philanthropic gathering at the Aquarium of the Bay, multiple people independently spoke to aligning social and environmental values with financial values, and explicitly stated that economic growth does not automatically entail environmental degradation.

As my new buddy David Traub (introduced as a consequence of the New Algorithm event in Stockholm in June with Angelica Lips da Cruz) was the MC, he put me on the program at the last minute, and gave me five minutes to speak my piece in a room of 30 people or so. A great point of departure was opened up when Carin Winter of MissionBe.org spoke to her work in mindfulness education and led a guided meditation. So I conveyed the fact that the effects of mindfulness practice are rigorously measurable, and followed that up with the analogy from music (tuning instruments to harmonize relationships),  with the argument against merely shouldering the burden of costs because it is the right thing to do, with the counter-argument for creating efficient competitive markets for sustainable impacts, and with info on the previous week’s special session on social and psychological metrology at IMEKO in Belfast. It appeared that the message of metrology as a means for making sustainability self-sustaining, fun, and profitable got through!

Next up: Unify.Earth has developed their own new iteration on blockchain, which will be announced Monday, 24 September, at the UN SDG Media Center (also see here) during the World Economic Forum’s Sustainable Development Impact Summit. The UEX (Unify Earth Exchange) fills the gap for human capital stocks left by the Universal Commons‘ exclusive focus on social and natural capital.

So I’ve decided to go to NY and have booked my travel.

Back in February, Angelica Lips da Cruz recounted saying six months before that it would take two years to get to where we were at that time. Now another seven months have passed and I am starting to feel that the acceleration is approaching Mach 1! At this rate, it’ll be the speed of light in the next six months….

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Evaluating Questionnaires as Measuring Instruments

June 23, 2018

An email came in today asking whether three different short (4- and 5-item) questionnaires could be expected to provide reasonable quality measurement. Here’s my response.

—–

Thanks for raising this question. The questionnaire plainly was not designed to provide data suitable for measurement. Though much can be learned about making constructs measurable from data produced by this kind of questionnaire, “Rasch analysis” cannot magically create a silk purse from a sow’s ear (as the old expression goes). Use Linacre’s (1993) generalizability theory nomograph to see what reliabilities are expected for each subscale, given the numbers of items and rating categories, and applying a conservative estimate of the adjusted standard deviations (1.0 logit, for instance). Convert the reliability coefficients into strata (Fisher, 1992, 2008; Wright & Masters, 1982, pp. 92, 105-106) to make the practical meaning of the precision obtained obvious.

So if you have data, analyze it and compare the expected and observed reliabilities. If the uncertainties are quite different, is that because of targeting issues? But before you do that, ask experts in the area to rank order:

  • the courses by relevance to the job;
  • the evaluation criteria from easy to hard; and
  • the skills/competencies in order of importance to job performance.

Then study the correspondence between the rankings and the calibration results. Where do they converge and diverge? Why? What’s unexpected? What can be learned?

Analyze all of the items in each area (student, employer, instructor) together in Winsteps and study each of the three tables 23.x, setting PRCOMP=S. Remember that the total variance explained is not interpreted simply in terms of “more is better” and that the total variance explained is not as important as the ratio of that variance to the variance in the first contrast (see Linacre, 2006, 2008). If the ratio is greater than 3, the scale is essentially unidimensional (though significant problems may remain to be diagnosed and corrected).

Common practice holds that unexplained variance eigenvalues should be less than 1.5, but this overly simplistic rule of thumb (Chou & Wang, 2010; Raîche, 2005) has been contradicted in practice many times, since, even if one or more eigenvalues are over 1.5, theory may say the items belong to the same construct, and the disattenuated correlations of the measures implied by the separate groups of items (provided in tables 23.x) may still approach 1.00, indicating that the same measures are produced across subscales. See Green (1996) and Smith (1996), among others, for more on this.

If subscales within each of the three groups of items are markedly different in the measures they produce, then separate them in different analyses. If these further analyses reveal still more multidimensionalities, it’s time to go back to the drawing board, given how short these scales are. If you define a plausible scale, study the item difficulty orders closely with one or more experts in the area. If there is serious interest in precision measurement and its application to improved management, and not just a bureaucratic need for data to satisfy empty demands for a mere appearance of quality assessment, then trace the evolution of the construct as it changes from less to more across the items.

What, for instance, is the common theme addressed across the courses that makes them all relevant to job performance? The courses were each created with an intention and they were brought together into a curriculum for a purpose. These intentions and purposes are the raw material of a construct theory. Spell out the details of how the courses build competency in translation.

Furthermore, I imagine that this curriculum, by definition, was set up to be effective in training students no matter who is in the courses (within the constraints of the admission criteria), and no matter which particular challenges relevant to job performance are sampled from the universe of all possible challenges. You will recognize these unexamined and unarticulated assumptions as what need to be explicitly stated as hypotheses informing a model of the educational enterprise. This model transforms implicit assumptions into requirements that are never fully satisfied but can be very usefully approximated.

As I’ve been saying for a long time (Fisher, 1989), please do not accept the shorthand language of references to “the Rasch model”, “Rasch scaling”, “Rasch analysis”, etc. Rasch did not invent the form of these models, which are at least as old as Plato. And measurement is not a function of data analysis. Data provide experimental evidence testing model-based hypotheses concerning construct theories. When explanatory theory corroborates and validates data in calibrated instrumentation, the instrument can be applied at the point of use with no need for data analysis, to produce measures, uncertainty (error) estimates, and graphical fit assessments (Connolly, Nachtman, & Pritchett, 1971; Davis, et al., 2008; Fisher, 2006; Fisher, Kilgore, & Harvey, 1995; Linacre, 1997; many others).

So instead of using those common shorthand phrases, please speak directly to the problem of modeling the situation in order to produce a practical tool for managing it.

Further information is available in the references below.

 

Aryadoust, S. V. (2009). Mapping Rasch-based measurement onto the argument-based validity framework. Rasch Measurement Transactions, 23(1), 1192-3 [http://www.rasch.org/rmt/rmt231.pdf].

Chang, C.-H. (1996). Finding two dimensions in MMPI-2 depression. Structural Equation Modeling, 3(1), 41-49.

Chou, Y. T., & Wang, W. C. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70, 717-731.

Connolly, A. J., Nachtman, W., & Pritchett, E. M. (1971). Keymath: Diagnostic Arithmetic Test. Circle Pines, Minnesota: American Guidance Service. Retrieved 23 June 2018 from https://images.pearsonclinical.com/images/pa/products/keymath3_da/km3-da-pub-summary.pdf

Davis, A. M., Perruccio, A. V., Canizares, M., Tennant, A., Hawker, G. A., Conaghan, P. G. et al. (2008, May). The development of a short measure of physical function for hip OA HOOS-Physical Function Shortform (HOOS-PS): An OARSI/OMERACT initiative. Osteoarthritis Cartilage, 16(5), 551-559.

Fisher, W. P., Jr. (1989). What we have to offer. Rasch Measurement Transactions, 3(3), 72 [http://www.rasch.org/rmt/rmt33d.htm].

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238  [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2006). Survey design recommendations [expanded from Fisher, W. P. Jr. (2000) Popular Measurement, 3(1), pp. 58-59]. Rasch Measurement Transactions, 20(3), 1072-1074 [http://www.rasch.org/rmt/rmt203.pdf].

Fisher, W. P., Jr. (2008). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-1163 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5(1), 3-25.

Green, K. E. (1996). Dimensional analyses of complex data. Structural Equation Modeling, 3(1), 50-61.

Linacre, J. M. (1993). Rasch-based generalizability theory. Rasch Measurement Transactions, 7(1), 283-284; [http://www.rasch.org/rmt/rmt71h.htm].

Linacre, J. M. (1997). Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation State of the Art Reviews, 11(2), 315-324 [http://www.rasch.org/memo60.htm].

Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2(3), 266-83.

Linacre, J. M. (1998). Structure in Rasch residuals: Why principal components analysis? Rasch Measurement Transactions, 12(2), 636 [http://www.rasch.org/rmt/rmt122m.htm].

Linacre, J. M. (2003). PCA: Data variance: Explained, modeled and empirical. Rasch Measurement Transactions, 17(3), 942-943 [http://www.rasch.org/rmt/rmt173g.htm].

Linacre, J. M. (2006). Data variance explained by Rasch measures. Rasch Measurement Transactions, 20(1), 1045 [http://www.rasch.org/rmt/rmt201a.htm].

Linacre, J. M. (2008). PCA: Variance in data explained by Rasch measures. Rasch Measurement Transactions, 22(1), 1164 [http://www.rasch.org/rmt/rmt221j.htm].

Raîche, G. (2005). Critical eigenvalue sizes in standardized residual Principal Components Analysis. Rasch Measurement Transactions, 19(1), 1012 [http://www.rasch.org/rmt/rmt191h.htm].

Schumacker, R. E., & Linacre, J. M. (1996). Factor analysis and Rasch. Rasch Measurement Transactions, 9(4), 470 [http://www.rasch.org/rmt/rmt94k.htm].

Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3(2), 205-31.

Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3(1), 25-40.

Wright, B. D. (1996). Comparing Rasch measurement and factor analysis. Structural Equation Modeling, 3(1), 3-24.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, Illinois: MESA Press.

Revisiting Hayek’s Relevance to Measurement

May 31, 2018

As so often happens, I’m finding new opportunities for restating what seems obvious to me but does not impact others in the way it ought to. The work of the Austrian economist Friedrich Hayek speaks to me in a particular way that has always, to me, self-evidently expressed ideas of fundamental value and interest. Reviewing his work again lately has opened it up to a new level of detail that is worth sharing here.

Hayek (1948, p. 54) is onto a key point about measurement and its role in economics when he says:

…the spontaneous actions of individuals will, under conditions which we can define, bring about a distribution of resources which can be understood as if it were made according to a single plan, although nobody has planned it…?

Decades of measurement research shows that individuals’ spontaneous responses to assessment and survey questions conform to one another in ways that might appear to have been centrally organized according to a single plan. But over and over again the same patterns are produced with no efforts made to guide or coerce responses that conform in that way.

The results of testing and assessment produced in educational measurement can be expressed in economic terms fitting quite well with Hayek’s observation. Student abilities, economically speaking, are human capital resources. Each student has some amount of ability that can be considered a supply of resources available for application to the demands of the challenges posed by the assessment questions. When assessment data fit a Rasch model, the supply of student abilities have spontaneously organized themselves in relation to challenging demands for that supply of abilities posed by the test questions. The invariant consistency of the data and resulting model fit has not been produced by coercing or guiding the students to respond in a particular way. Although questions can be written to vary in difficulty according to a construct theory, and though educational curricula traditionally vary in difficulty across grade levels, the patterns of growth and change that are observed are plainly not taking place as a result of anyone’s intentions or plans.

This kind of complex adaptive, self-organizing process (Fisher, 2017) describes not just the relations of student abilities and task difficulties, but also the relations of customer preferences to product features, patient health and functionality relative to disease and disability, etc. It also, of course, applies to supply and demand relative to a price (Fisher, 2015). For students, the price to be paid follows from the probability of a supply of ability meeting the demand for it posed by the challenges encountered in assessment items.

Getting back to Hayek (1948, p. 54), here we meet the relevance of the

…central question of all social sciences: How can the combination of fragments of knowledge existing in different minds bring about results which, if they were to be brought about deliberately, would require a knowledge on the part of the directing mind which no single person can possess?

Per Hayek’s point, no one student will know the answers to all of the questions posed in a test, and yet all of the students’ fragments of knowledge combine in a way that bring about results seemingly defined by a single intelligence. It is this bottom up and self-organized emergence of knowledge structures that we capture in measurement and bring into our culture, our sciences, and our economies by bringing things into words and the common languages of standardized metrics.

This spontaneous emergence of structure does not lead directly of its own accord to the creation of markets. Rather, it is vitally important to recognize, along with Miller and O’Leary (2007, p. 710) that:

Markets are not spontaneously generated by the exchange activity of buyers and sellers. Rather, skilled actors produce institutional arrangements, the rules, roles and relationships that make market exchange possible. The institutions define the market, rather than the reverse.

The institutional arrangements we need to make to create efficient markets for human, social, and natural capital will be staggeringly difficult to realize. But a point in time will come when the costs of remaining in our current cultural, political, and economic ruts will be greater, and the benefits will be lower, than the costs and benefits of investing in a new future. That time may be sooner than anyone thinks it will be.

References

Fisher, W. P., Jr. (2015). A probabilistic model of the law of supply and demand. Rasch Measurement Transactions, 29(1), 1508-1511  [http://www.rasch.org/rmt/rmt291.pdf].

Fisher, W. P., Jr. (2017). A practical approach to modeling complex adaptive flows in psychology and social science. Procedia Computer Science, 114, 165-174. Retrieved from https://doi.org/10.1016/j.procs.2017.09.027

Hayek, F. A. (1948). Individualism and economic order. Chicago: University of Chicago Press.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-734.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Measuring Values To Apply The Golden Rule

December 29, 2016

Paper presentation 45.20, American Educational Research Association

New Orleans, April 1994

 

Objective

Basing her comments on the writings of Michael Lerner in Tikkun magazine, “Hillary Rodham Clinton speaks appealingly of a political morality based on the Golden Rule,” says Chicago Tribune columnist Clarence Page.  Lerner and Clinton are correct in asserting that we need to rediscover and re-invigorate our spiritual values, though there is nothing new in this assertion, and Page is correct in his opinion that conservative columnists who say religion is spirituality, and that there is therefore nothing in need of re-invigoration, are wrong.  Research on the spiritual dimension of disability, for instance, shows that the quality of spiritual experience has little, if anything, to do with religious church attendance, bible reading, prayer, or the taking of sacraments (Fisher & Pugliese, 1989).

The purpose of this paper is to propose a research program that would begin to prepare the ground in which a political morality based on the Golden Rule might be cultivated.

Theoretical Framework

Implementing a “political morality based on the Golden Rule” requires some way of knowing that what I do unto others is the same as what I would have done unto me. To know this, I need a measuring system that keeps things in proportion by showing what counts as the same thing for different people.  A political morality based on the Golden Rule has got to have some way of identifying when a service or action done unto others is the same as the one done unto me.  In short, application of the Golden Rule requires an empirical basis of comparison, a measuring system that sets up analogies between people’s values and what is valued.  We must be able to say that my values are to one aspect of a situation what yours are to that or another aspect, and that proportions of this kind hold constant no matter which particular persons are addressed and no matter which aspects of the situation are involved.

Technique

Is it possible to measure what people value—politically, socially, economically, spiritually, and culturally—in a way that embodies the Golden Rule? If so, could such a measure be used for realizing the political morality Hillary Rodham Clinton has advocated?  L. L. Thurstone presented methods for successfully revealing the necessary proportions in the 1920s; these were improved upon by the Danish mathematician Georg Rasch in the 1950s.  Thurstone’s and Rasch’s ideas are researched and applied today by Benjamin D. Wright and J. Michael Linacre.  These and other thinkers hold that measurement takes place only when application of the Golden Rule is possible.  That is, measurement is achieved only if someone’s measure does not depend on who is in the group she is measured with, on the particular questions answered or not answered, on who made the measure, on the brand name of the instrument, or on where the measure took place.

Measurement of this high quality is called scale-free because its quantities do not vary according to the particular questions asked (as long as they pertain to the construct of interest); neither do they vary according to the structure or combination of the particular rating scheme(s) employed (rating scale, partial credit, correct/incorrect, true/false, present/absent, involvement of judges, paired comparisons, etc.), or the brand name of the instrument measuring.  All of these requirements must hold if I am to treat a person as I would like to be treated, because if they do not hold, I do not know enough about her values or mine to say whether she’s receiving the treatment I’d prefer in the same circumstance.

In order to make the Golden Rule the basis of a political morality, we need to improve the quality of measurement in every sphere of our lives; after all, politics is more than just what politicians do, it is a basic part of community life.  Even though the technology and methods for high quality measurement in education, sociology, and psychology have existed for decades, researchers have been indifferent to their use.

That indifference may be near an end.  If people get serious about applying the Golden Rule, they are going to come up against a need for rigorous quantitative measurement.  We need to let them know that the tools for the job are available.

Data sources

Miller’s Scale Battery of International Patterns and Norms (SBIPN) (Miller, 1968, 1970, 1973), described in Miller (1983, pp. 462-468), is an instrument that presents possibilities for investigating quantitative relations among value systems.  The instrument is composed of 20 six-point rating scale items involving such cultural norms and patterns as social acceptance, family solidarity, trustfulness, moral code, honesty, reciprocity, class structure, etc.  Each pair of rating scale points (1-2, 3-4, 5-6) is associated with a 15-30 word description; raters judge national values by assigning ratings, where 1 indicates the most acceptance, solidarity, trust, morality, etc., and 6 the least.  Miller (1983, p. 462) reports test-retest correlations of .74 to .97 for the original 15 items on the survey as testing in the United States and Peru.  Validity claims are based on the scale’s ability to distinguish between values of citizens of the United States and Peru, with supporting research comparing values in Argentina, Spain, England, and the United States.

The SBIPN could probably be improved in several ways.  First, individual countries contain so many diverse ethnic groups and subcultures whose value systems are often in conflict that ratings should probably be made of them and not of the entire population.  The geographical location of the ethnic group or subculture rated should also be tracked in order to study regional variations.  Second, Miller contends that raters must have a college degree to be qualified as a SBIPN judge; the complexity of his rating procedure justifies this claim.  In order to simplify the survey and broaden the base of qualified judges, the three groups of short phrases structuring each six-point rating scale should be used as individual items rated on a frequency continuum.

For instance, the following phrases appear in association with ratings of 1 and 2 under social acceptance:

high social acceptance. Social contacts open and nonrestrictive. Introductions not needed for social contacts.  Short acquaintance provides entry into the home and social organizations.

Similar descriptions are associated with the 3-4 (medium social acceptance) and 5-6 (low social acceptance) rating pairs; only one rating from the series of six is assigned, so that a rating of 1 or 2 is assigned only if the judgment is of high social acceptance.  Instead of asking the rater to assign one of two ratings to all six of these statements (breaking apart the two conjunctive phrases), and ignoring the 10-20 phrases associated with the other four rating scale points, each phrase presented on the six-point continuum should be rated separately for the frequency of the indicated pattern or norm.  A four-point rating scale (Almost Always, Frequently, Sometimes, Rarely) should suffice.

Linacre’s (1993, p. 284) graphical presentation of Rasch-based Generalizability Theory indicates that reliability and separation statistics of .92 and 3.4, respectively, can be expected for a 20-item, six-point rating scale survey (Miller’s original format), assuming a measurement standard deviation of one logit.  360 items will be produced if each of the original 20 six-point items can be transformed into 18 four-point items (following the above example’s derivation of six items from one of the three blocks of one item’s descriptive phrases).  If only 250 of these items work to support the measurement effort, Linacre’s graph shows that a reliability of .99 and separation of 10 might be obtained, again assuming a measurement standard deviation of one logit.  Since not all of the survey’s items would probably be administered at once, these estimates are probably high.  The increased number of items, however, would be advantageous for use as an item bank in a computer adapted administration of the survey.

Expected results

Miller’s applications of the SBIPN provide specific indications of what might be expected from the revised form of the survey.  Family solidarity tends to be low, labor assimilated into the prevailing economic system, class consciousness devalued, and moral conduct secularly defined in the United States, in opposition to Colombia and Peru, where family solidarity is high, labor is antagonistic to the prevailing economic system, class structure is rigidly defined, and moral conduct is religiously defined.  At the other extreme, civic participation, work and achievement, societal consensus, children’s independence, and democracy are highly valued in the United States, but considerably less so in Colombia and Peru.

Miller’s presentation of the survey results will be improved on in several ways.  First, construct validity will be examined in terms of the data’s internal consistency (fit analysis) and the conceptual structure delineated by the items.  Second, the definition of interval measurement continua for each ethnic group or subculture measured will facilitate quantitative and qualitative comparisons of each group’s self-image with its public image.  Differences in group perception can be used for critical self-evaluation as well as information crucial for rectifying unjust projections of prejudice.

Scientific importance

One of the most important benefits of this survey could be the opportunity to show that, although different value systems vary in their standards of what counts as acceptable behaviors and attitudes, the procedures by which values are calibrated and people’s personal values are measured do not vary.  That this should turn out to be the case will make it more difficult to justify and maintain hostile prejudices against others whose value systems differ from one’s own.  If people who do not share my values cannot immediately be categorized as godless, heathens, infidels, pagans, unwashed, etc., ie, in the category of the non-classifiable, then I should be less prone to disregard, hate, or fear them, and more able to build a cohesive, healthy, and integrated community with them.

The cultural prejudice structuring this proposal is that increased understanding of others’ values is good; that this prejudice needs to be made explicit and evaluated for its effect on those who do not share it is of great importance.  The possibility of pursuing a quantitative study of value systems may strike some as an area of research that could only be used to dominate and oppress those who do not have the power to defend themselves.  This observation implies that one reason why more rigorous scientific measurement procedures have failed to take hold in the social studies may be because we have unspoken, but nonetheless justifiable, reservations concerning our capacity to employ high quality information responsibly.  Knowledge is inherently dangerous, but a political morality based on the Golden Rule will require nothing less than taking another bite of the apple from the Tree of Knowledge.

 

References

Fisher, William P. & Karen Pugliese. 1989.  Measuring the importance of pastoral care in rehabilitation. Archives of Physical Medicine and Rehabilitation, 70, A-22 [Abstract].

Linacre, J. Michael. 1993. Rasch-based generalizability theory. Rasch Measurement, 7: 283-284.

Miller, Delbert C. 1968. The measurement of international patterns and norms: A tool for comparative research. Southwestern Social Science Quarterly, 48: 531-547.

Miller, Delbert C. 1970. International Community Power Structures: Comparative Studies of Four World Cities. Bloomington: Indiana University Press.

Miller, Delbert C. 1972. Measuring cross national norms: Methodological problems in identifying patterns in Latin America and Anglo-Saxon Cultures.  International Journal of Comparative Sociology, 13(3-4): 201-216.

Miller, Delbert C. 1983. Handbook of Research Design and Social Measurement. 4th ed. New York: Longman.

Excerpts and Notes from Goldberg’s “Billions of Drops…”

December 23, 2015

Goldberg, S. H. (2009). Billions of drops in millions of buckets: Why philanthropy doesn’t advance social progress. New York: Wiley.

p. 8:
Transaction costs: “…nonprofit financial markets are highly disorganized, with considerable duplication of effort, resource diversion, and processes that ‘take a fair amount of time to review grant applications and to make funding decisions’ [citing Harvard Business School Case No. 9-391-096, p. 7, Note on Starting a Nonprofit Venture, 11 Sept 1992]. It would be a major understatement to describe the resulting capital market as inefficient.”

A McKinsey study found that nonprofits spend 2.5 to 12 times more raising capital than for-profits do. When administrative costs are factored in, nonprofits spend 5.5 to 21.5 times more.

For-profit and nonprofit funding efforts contrasted on pages 8 and 9.

p. 10:
Balanced scorecard rating criteria

p. 11:
“Even at double-digit annual growth rates, it will take many years for social entrepreneurs and their funders to address even 10% of the populations in need.”

p. 12:
Exhibit 1.5 shows that the percentages of various needs served by leading social enterprises are barely drops in the respective buckets; they range from 0.07% to 3.30%.

pp. 14-16:
Nonprofit funding is not tied to performance. Even when a nonprofit makes the effort to show measured improvement in impact, it does little or nothing to change their funding picture. It appears that there is some kind of funding ceiling implicitly imposed by funders, since nonprofit growth and success seems to persuade capital sources that their work there is done. Mediocre and low performing nonprofits seem to be able to continue drawing funds indefinitely from sympathetic donors who don’t require evidence of effective use of their money.

p. 34:
“…meaningful reductions in poverty, illiteracy, violence, and hopelessness will require a fundamental restructuring of nonprofit capital markets. Such a restructuring would need to make it much easier for philanthropists of all stripes–large and small, public and private, institutional and individual–to fund nonprofit organizations that maximize social impact.”

p. 54:
Exhibit 2.3 is a chart showing that fewer people rose from poverty, and more remained in it or fell deeper into it, in the period of 1988-98 compared with 1969-1979.

pp. 70-71:
Kotter’s (1996) change cycle.

p. 75:
McKinsey’s seven elements of nonprofit capacity and capacity assessment grid.

pp. 94-95:
Exhibits 3.1 and 3.2 contrast the way financial markets reward for-profit performance with the way nonprofit markets reward fund raising efforts.

Financial markets
1. Market aggregates and disseminates standardized data
2. Analysts publish rigorous research reports
3. Investors proactively search for strong performers
4. Investors penalize weak performers
5. Market promotes performance
6. Strong performers grow

Nonprofit markets
1. Social performance is difficult to measure
2. NPOs don’t have resources or expertise to report results
3. Investors can’t get reliable or standardized results data
4. Strong and weak NPOs spend 40 to 60% of time fundraising
5. Market promotes fundraising
6. Investors can’t fund performance; NPOs can’t scale

p. 95:
“…nonprofits can’t possibly raise enough money to achieve transformative social impact within the constraints of the existing fundraising system. I submit that significant social progress cannot be achieved without what I’m going to call ‘third-stage funding,’ that is, funding that doesn’t suffer from disabling fragmentation. The existing nonprofit capital market is not capable of [p. 97] providing third-stage funding. Such funding can arise only when investors are sufficiently well informed to make big bets at understandable and manageable levels of risk. Existing nonprofit capital markets neither provide investors with the kinds of information needed–actionable information about nonprofit performance–nor provide the kinds of intermediation–active oversight by knowledgeable professionals–needed to mitigate risk. Absent third-stage funding, nonprofit capital will remain irreducibly fragmented, preventing the marshaling of resources that nonprofit organizations need to make meaningful and enduring progress against $100 million problems.”

pp. 99-114:
Text and diagrams on innovation, market adoption, transformative impact.

p. 140:
Exhibit 4.2: Capital distribution of nonprofits, highlighting mid-caps

pages 192-3 make the case for the difference between a regular market and the current state of philanthropic, social capital markets.

p. 192:
“So financial markets provide information investors can use to compare alternative investment opportunities based on their performance, and they provide a dynamic mechanism for moving money away from weak performers and toward strong performers. Just as water seeks its own level, markets continuously recalibrate prices until they achieve a roughly optimal equilibrium at which most companies receive the ‘right’ amount of investment. In this way, good companies thrive and bad ones improve or die.
“The social sector should work the same way. .. But philanthropic capital doesn’t flow toward effective nonprofits and away from ineffective nonprofits for a simple reason: contributors can’t tell the difference between the two. That is, philanthropists just don’t [p. 193] know what various nonprofits actually accomplish. Instead, they only know what nonprofits are trying to accomplish, and they only know that based on what the nonprofits themselves tell them.”

p. 193:
“The signs that the lack of social progress is linked to capital market dysfunctions are unmistakable: fundraising remains the number-one [p. 194] challenge of the sector despite the fact that nonprofit leaders divert some 40 to 60% of their time from productive work to chasing after money; donations raised are almost always too small, too short, and too restricted to enhance productive capacity; most mid-caps are ensnared in the ‘social entrepreneur’s trap’ of focusing on today and neglecting tomorrow; and so on. So any meaningful progress we could make in the direction of helping the nonprofit capital market allocate funds as effectively as the private capital market does could translate into tremendous advances in extending social and economic opportunity.
“Indeed, enhancing nonprofit capital allocation is likely to improve people’s lives much more than, say, further increasing the total amount of donations. Why? Because capital allocation has a multiplier effect.”

“If we want to materially improve the performance and increase the impact of the nonprofit sector, we need to understand what’s preventing [p. 195] it from doing a better job of allocating philanthropic capital. And figuring out why nonprofit capital markets don’t work very well requires us to understand why the financial markets do such a better job.”

p. 197:
“When all is said and done, securities prices are nothing more than convenient approximations that market participants accept as a way of simplifying their economic interactions, with a full understanding that market prices are useful even when they are way off the mark, as they so often are. In fact, that’s the whole point of markets: to aggregate the imperfect and incomplete knowledge held by vast numbers of traders about much various securities are worth and still make allocation choices that are better than we could without markets.
“Philanthropists face precisely the same problem: how to make better use of limited information to maximize output, in this case, social impact. Considering the dearth of useful tools available to donors today, the solution doesn’t have to be perfect or even all that good, at least at first. It just needs to improve the status quo and get better over time.
“Much of the solution, I believe, lies in finding useful adaptations of market mechanisms that will mitigate the effects of the same lack of reliable and comprehensive information about social sector performance. I would even go so far as to say that social enterprises can’t hope to realize their ‘one day, all children’ visions without a funding allociation system that acts more like a market.
“We can, and indeed do, make incremental improvements in nonprofit funding without market mechanisms. But without markets, I don’t see how we can fix the fragmentation problem or produce transformative social impact, such as ensuring that every child in America has a good education. The problems we face are too big and have too many moving parts to ignore the self-organizing dynamics of market economics. As Thomas Friedman said about the need to impose a carbon tax at a time of falling oil prices, ‘I’ve wracked my brain trying to think of ways to retool America around clean-power technologies without a price signal–i.e., a tax–and there are no effective ones.”

p. 199:
“Prices enable financial markets to work the way nonprofit capital markets should–by sending informative signals about the most effective organizations so that money will flow to them naturally..”

p. 200:
[Quotes Kurtzman citing De Soto on the mystery of capital. Also see p. 209, below.]
“‘Solve the mystery of capital and you solve many seemingly intractable problems along with it.'”
[That’s from page 69 in Kurtzman, 2002.]

p. 201:
[Goldberg says he’s quoting Daniel Yankelovich here, but the footnote does not appear to have anything to do with this quote:]
“‘The first step is to measure what can easily be measured. The second is to disregard what can’t be measured, or give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can’t be measured easily isn’t very important. This is blindness. The fourth step is to say that what can’t be easily measured really doesn’t exist. This is suicide.'”

Goldberg gives example here of $10,000 invested witha a 10% increase in value, compared with $10,000 put into a nonprofit. “But if the nonprofit makes good use of the money and, let’s say, brings the reading scores of 10 elementary school students up from below grade level to grade level, we can’t say how much my initial investment is ‘worth’ now. I could make the argument that the value has increased because the students have received a demonstrated educational benefit that is valuable to them. Since that’s the reason I made the donation, the achievement of higher scores must have value to me, as well.”

p. 202:
Goldberg wonders whether donations to nonprofits would be better conceived as purchases than investments.

p. 207:
Goldberg quotes Jon Gertner from the March 9, 2008, issue of the New York Times Magazine devoted to philanthropy:

“‘Why shouldn’t the world’s smartest capitalists be able to figure out more effective ways to give out money now? And why shouldn’t they want to make sure their philanthropy has significant social impact? If they can measure impact, couldn’t they get past the resistance that [Warren] Buffet highlighted and finally separate what works from what doesn’t?'”

p. 208:
“Once we abandon the false notions that financial markets are precision instruments for measuring unambiguous phenomena, and that the business and nonproft sectors are based in mutually exclusive principles of value, we can deconstruct the true nature of the problems we need to address and adapt market-like mechanisms that are suited to the particulars of the social sector.
“All of this is a long way (okay, a very long way) of saying that even ordinal rankings of nonprofit investments can have tremendous value in choosing among competing donation opportunities, especially when the choices are so numerous and varied. If I’m a social investor, I’d really like to know which nonprofits are likely to produce ‘more’ impact and which ones are likely to produce ‘less.'”

“It isn’t necessary to replicate the complex working of the modern stock markets to fashion an intelligent and useful nonprofit capital allocation mechanism. All we’re looking for is some kind of functional indication that would (1) isolate promising nonprofit investments from among the confusing swarm of too many seemingly worthy social-purpose organizations and (2) roughly differentiate among them based on the likelihood of ‘more’ or ‘less’ impact. This is what I meant earlier by increasing [p. 209] signals and decreasing noise.”

p. 209:
Goldberg apparently didn’t read De Soto, as he says that the mystery of capital is posed by Kurtzman and says it is solved via the collective intelligence and wisdom of crowds. This completely misses the point of the crucial value that transparent representations of structural invariance hold in market functionality. Goldberg is apparently offering a loose kind of market for which there is an aggregate index of stocks for nonprofits that are built up from their various ordinal performance measures. I think I find a better way in my work, building more closely from De Soto (Fisher, 2002, 2003, 2005, 2007, 2009a, 2009b).

p. 231:
Goldberg quotes Harvard’s Allen Grossman (1999) on the cost-benefit boundaries of more effective nonprofit capital allocation:

“‘Is there a significant downside risk in restructuring some portion of the philanthropic capital markets to test the effectiveness of performance driven philanthropy? The short answer is, ‘No.’ The current reality is that most broad-based solutions to social problems have eluded the conventional and fragmented approaches to philanthropy. It is hard to imagine that experiments to change the system to a more performance driven and rational market would negatively impact the effectiveness of the current funding flows–and could have dramatic upside potential.'”

p. 232:
Quotes Douglas Hubbard’s How to Measure Anything book that Stenner endorsed, and Linacre and I didn’t.

p. 233:
Cites Stevens on the four levels of measurement and uses it to justify his position concerning ordinal rankings, recognizing that “we can’t add or subtract ordinals.”

pp. 233-5:
Justifies ordinal measures via example of Google’s PageRank algorithm. [I could connect from here using Mary Garner’s (2009) comparison of PageRank with Rasch.]

p. 236:
Goldberg tries to justify the use of ordinal measures by citing their widespread use in social science and health care. He conveniently ignores the fact that virtually all of the same problems and criticisms that apply to philanthropic capital markets also apply in these areas. In not grasping the fundamental value of De Soto’s concept of transferable and transparent representations, and in knowing nothing of Rasch measurement, he was unable to properly evaluate to potential of ordinal data’s role in the formation of philanthropic capital markets. Ordinal measures aren’t just not good enough, they represent a dangerous diversion of resources that will be put into systems that take on lives of their own, creating a new layer of dysfunctional relationships that will be hard to overcome.

p. 261 [Goldberg shows here his complete ignorance about measurement. He is apparently totally unaware of the work that is in fact most relevant to his cause, going back to Thurstone in 1920s, Rasch in the 1950s-1970s, and Wright in the 1960s to 2000. Both of the problems he identifies have long since been solved in theory and in practice in a wide range of domains in education, psychology, health care, etc.]:
“Having first studied performance evaluation some 30 years ago, I feel confident in saying that all the foundational work has been done. There won’t be a ‘eureka!’ breakthrough where someone finally figures out the one true way to guage nonprofit effectiveness.
“Indeed, I would venture to say that we know virtually everything there is to know about measuring the performance of nonprofit organizations with only two exceptions: (1) How can we compare nonprofits with different missions or approaches, and (2) how can we make actionable performance assessments common practice for growth-ready mid-caps and readily available to all prospective donors?”

p. 263:
“Why would a social entrepreneur divert limited resources to impact assessment if there were no prospects it would increase funding? How could an investor who wanted to maximize the impact of her giving possibly put more golden eggs in fewer impact-producing baskets if she had no way to distinguish one basket from another? The result: there’s no performance data to attract growth capital, and there’s no growth capital to induce performance measurement. Until we fix that Catch-22, performance evaluation will not become an integral part of social enterprise.”

pp. 264-5:
Long quotation from Ken Berger at Charity Navigator on their ongoing efforts at developing an outcome measurement system. [wpf, 8 Nov 2009: I read the passage quoted by Goldberg in Berger’s blog when it came out and have been watching and waiting ever since for the new system. wpf, 8 Feb 2012: The new system has been online for some time but still does not include anything on impacts or outcomes. It has expanded from a sole focus on financials to also include accountability and transparency. But it does not yet address Goldberg’s concerns as there still is no way to tell what works from what doesn’t.]

p. 265:
“The failure of the social sector to coordinate independent assets and create a whole that exceeds the sum of its parts results from an absence of.. platform leadership’: ‘the ability of a company to drive innovation around a particular platform technology at the broad industry level.’ The object is to multiply value by working together: ‘the more people who use the platform products, the more incentives there are for complement producers to introduce more complementary products, causing a virtuous cycle.'” [Quotes here from Cusumano & Gawer (2002). The concept of platform leadership speaks directly to the system of issues raised by Miller & O’Leary (2007) that must be addressed to form effective HSN capital markets.]

p. 266:
“…the nonprofit sector has a great deal of both money and innovation, but too little available information about too many organizations. The result is capital fragmentation that squelches growth. None of the stakeholders has enough horsepower on its own to impose order on this chaos, but some kind of realignment could release all of that pent-up potential energy. While command-and-control authority is neither feasible nor desirable, the conditions are ripe for platform leadership.”

“It is doubtful that the IMPEX could amass all of the resources internally needed to build and grow a virtual nonprofit stock market that could connect large numbers of growth-capital investors with large numbers of [p. 267] growth-ready mid-caps. But it might be able to convene a powerful coalition of complementary actors that could achieve a critical mass of support for performance-based philanthropy. The challenge would be to develop an organization focused on filling the gaps rather than encroaching on the turf of established firms whose participation and innovation would be required to build a platform for nurturing growth of social enterprise..”

p. 268-9:
Intermediated nonprofit capital market shifts fundraising burden from grantees to intermediaries.

p. 271:
“The surging growth of national donor-advised funds, which simplify and reduce the transaction costs of methodical giving, exemplifies the kind of financial innovation that is poised to leverage market-based investment guidance.” [President of Schwab Charitable quoted as wanting to make charitable giving information- and results-driven.]

p. 272:
Rating agencies and organizations: Charity Navigator, Guidestar, Wise Giving Alliance.
Online donor rankings: GlobalGiving, GreatNonprofits, SocialMarkets
Evaluation consultants: Mathematica

Google’s mission statement: “to organize the world’s information and make it universally accessible and useful.”

p. 273:
Exhibit 9.4 Impact Index Whole Product
Image of stakeholders circling IMPEX:
Trading engine
Listed nonprofits
Data producers and aggregators
Trading community
Researchers and analysts
Investors and advisors
Government and business supporters

p. 275:
“That’s the starting point for replication [of social innovations that work]: finding and funding; matching money with performance.”

[WPF bottom line: Because Goldberg misses De Soto’s point about transparent representations resolving the mystery of capital, he is unable to see his way toward making the nonprofit capital markets function more like financial capital markets, with the difference being the focus on the growth of human, social, and natural capital. Though Goldberg intuits good points about the wisdom of crowds, he doesn’t know enough about the flaws of ordinal measurement relative to interval measurement, or about the relatively easy access to interval measures that can be had, to do the job.]

References

Cusumano, M. A., & Gawer, A. (2002, Spring). The elements of platform leadership. MIT Sloan Management Review, 43(3), 58.

De Soto, H. (2000). The mystery of capital: Why capitalism triumphs in the West and fails everywhere else. New York: Basic Books.

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2003). Measurement and communities of inquiry. Rasch Measurement Transactions, 17(3), 936-8 [http://www.rasch.org/rmt/rmt173.pdf].

Fisher, W. P., Jr. (2005). Daredevil barnstorming to the tipping point: New aspirations for the human sciences. Journal of Applied Measurement, 6(3), 173-9 [http://www.livingcapitalmetrics.com/images/FisherJAM05.pdf].

Fisher, W. P., Jr. (2007, Summer). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-3 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2009a). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In M. Wilson, K. Draney, N. Brown & B. Duckor (Eds.), Advances in Rasch Measurement, Vol. Two (p. in press [http://www.livingcapitalmetrics.com/images/BringingHSN_FisherARMII.pdf]). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2009b, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Garner, M. (2009, Autumn). Google’s PageRank algorithm and the Rasch measurement model. Rasch Measurement Transactions, 23(2), 1201-2 [http://www.rasch.org/rmt/rmt232.pdf].

Grossman, A. (1999). Philanthropic social capital markets: Performance driven philanthropy (Social Enterprise Series 12 No. 00-002). Harvard Business School Working Paper.

Kotter, J. (1996). Leading change. Cambridge, Massachusetts: Harvard Business School Press.

Kurtzman, J. (2002). How the markets really work. New York: Crown Business.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-34.

With Reich in spirit, but with a different sense of the problem and its solution

October 4, 2015

In today’s editorial in the San Francisco Chronicle, Robert Reich seeks some way of defining a solution to the pressing problems of how globalization and technological changes have made American workers less competitive. He rightly says that “reversing the scourge of widening inequality requires reversing the upward distributions [of income] within the rules of the market, and giving average people the bargaining power they need to get a larger share of the gains from growth.”

But Reich then says that the answer to this problem lies in politics, not economics. As I’ve pointed out before in this blog, focusing on marshaling political will is part of the problem, not part of the solution. Historically, politicians do not lead, they follow. As is demonstrated across events as diverse as the Arab Spring and the Preemption Act of 1841, mass movements of people have repeatedly demanded ways of cutting through the Gordian knots of injustice. And just as the political “leadership” across the Middle East and in the early U.S. dragged its feet, obstructed, and violently opposed change until it was already well underway, so, too, will that pattern repeat itself again in the current situation of inequitable income distribution.

The crux of the problem is that no one can give average people anything, not freedom (contra Dylan’s line in Blowin’ in the Wind about “allowing” people to be free) and certainly not a larger share of the gains from growth. As the old saying goes, you can lead a horse to water, but you can’t make it drink. People have to take what’s theirs. They have to want it, they have to struggle for it, and they have to pay for it, or they cannot own it and it will never be worth anything to them.

It is well known that a lack of individual property rights doomed communism and socialism because when everything is owned collectively by everyone, no one takes responsibility for it. The profit motive has the capacity to drive people to change things. The problem is not in profit itself. If birds and bees and trees and grasses did not profit from the sun, soil, and rain, there would be no life. The problem is in finding how to get a functional, self-sustaining economic ecology off the ground, not in unrealistically trying to manipulate and micromanage every detail.

The fundamental relevant characteristic of the profits being made today from intellectual property rights is that our individual rights to our own human and social capital are counter-productively restricted and undeveloped. How can it be that no one has any idea how much literacy or health capital they have, or what it is worth?! We have a metric system that tells us how much real estate and manufactured capital we own, and we can price it. But despite the well-established scientific facts of decades of measurement science research and practice, none of us can say, “I own x number of shares of stock in intellectual, literacy, or community capital, that have a value of x dollars in today’s market.” We desperately need an Intangible Assets Metric System, and the market rules, roles, and responsibilities that will make it impossible to make a profit while destroying human, social, and natural capital.

In this vein, what Reich gets absolutely correct is hidden inside his phrase, “within the rules of the market.” As I’ve so often repeated in this blog, capitalism is not inherently evil; it is, rather, unfinished. The real evil is in prolonging the time it takes to complete it. As was so eloquently stated by Miller and O’Leary (2007, p. 710):

“Markets are not spontaneously generated by the exchange activity of buyers and sellers. Rather, skilled actors produce institutional arrangements, the rules, roles and relationships that make market exchange possible. The institutions define the market, rather than the reverse.”

We have failed to set up the institutional arrangements needed to define human, social, and natural capital markets. The problem is that we cannot properly manage three of the four major forms of capital (human, social, and natural, with the fourth being manufactured/property) because we do not measure them in a common language built into scientifically, economically, legally and financially accountable titles, deeds, and other instruments.

And so, to repeat another one of my ad nauseum broken record nostrums, the problem is the problem. As long as we keep defining problems in the way we always have, as matters of marshalling political will, we will inadvertently find ourselves contributing more to prolonging tragic and needless human suffering, social discontent, and environmental degradation.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-734.