Posts Tagged ‘instrument calibration’

Psychology and the social sciences: An atheoretical, scattered, and disconnected body of research

February 16, 2019

A new article in Nature Human Behaviour (NHB) points toward the need for better theory and more rigorous mathematical models in psychology and the social sciences (Muthukrishna & Henrich, 2019). The authors rightly say that the lack of an overarching cumulative theoretical framework makes it very difficult to see whether new results fit well with previous work, or if something surprising has come to light. Mathematical models are especially emphasized as being of value in specifying clear and precise expectations.

The point that the social sciences and psychology need better theories and models is painfully obvious. But there are in fact thousands of published studies and practical real world applications that not only provide, but indeed often surpass, the kinds of predictive theories and mathematical models called for in the NHB article. The article not only makes no mention of any of this work, its argument is framed entirely in a statistical context instead of the more appropriate context of measurement science.

The concept of reliability provides an excellent point of entry. Most behavioral scientists think of reliability statistically, as a coefficient with a positive numeric value usually between 0.00 and 1.00. The tangible sense of reliability as indicating exactly how predictable an outcome is does not usually figure in most researchers’ thinking. But that sense of the specific predictability of results has been the focus of attention in social and psychological measurement science for decades.

For instance, the measurement of time is reliable in the sense that the position of the sun relative to the earth can be precisely predicted from geographic location, the time of day, and the day of the year. The numbers and words assigned to noon time are closely associated with the Sun being at the high point in the sky (though there are political variations by season and location across time zones).

That kind of a reproducible association is rarely sought in psychology and the social sciences, but it is far from nonexistent. One can discern different degrees to which that kind of association is included in models of measured constructs. Though most behavioral research doesn’t mention the connection between linear amounts of a measured phenomenon and a reproducible numeric representation of it (level 0), quite a significant body of work focuses on that connection (level 1). The disappointing thing about that level 1 work is that the relentless obsession with statistical methods prevents most researchers from connecting a reproducible quantity with a single expression of it in a standard unit, and with an associated uncertainty term (level 2). That is, level 1 researchers conceive measurement in statistical terms, as a product of data analysis. Even when results across data sets are highly correlated and could be equated to a common metric, level 1 researchers do not leverage that source of potential value for simplified communication and accumulated comparability.

And then, for their part, level 2 researchers usually do not articulate theories about the measured constructs, by augmenting the mathematical data model with an explanatory model predicting variation (level 3). Level 2 researchers are empirically grounded in data, and can expand their network of measures only by gathering more data and analyzing it in ways that bring it into their standard unit’s frame of reference.

Level 3 researchers, however, have come to see what makes their measures tick. They understand the mechanisms that make their questions vary. They can write new questions to their theoretical specifications, test those questions by asking them of a relevant sample, and produce the predicted calibrations. For instance, reading comprehension is well established to be a function of the difference between a person’s reading ability and the complexity of the text they encounter (see articles by Stenner in the list below). We have built our entire educational system around this idea, as we deliberately introduce children first to the alphabet, then to the most common words, then to short sentences, and then to ever longer and more complicated text. But stating the construct model, testing it against data, calibrating a unit to which all tests and measures can be traced, and connecting together all the books, articles, tests, curricula, and students is a process that began (in English and Spanish) only in the 1980s. The process still is far from finished, and most reading research still does not use the common metric.

In this kind of theory-informed context, new items can be automatically generated on the fly at the point of measurement. Those items and inferences made from them are validated by the consistency of the responses and the associated expression of the expected probability of success, agreement, etc. The expense of constant data gathering and analysis can be cut to a very small fraction of what it is at levels 0-2.

Level 3 research methods are not widely known or used, but they are not new. They are gaining traction as their use by national metrology institutes globally grows. As high profile critiques of social and psychological research practices continue to emerge, perhaps more attention will be paid to this important body of work. A few key references are provided below, and virtually every post in this blog pertains to these issues.

References

Baghaei, P. (2008). The Rasch model as a construct validation tool. Rasch Measurement Transactions, 22(1), 1145-6 [http://www.rasch.org/rmt/rmt221a.htm].

Bergstrom, B. A., & Lunz, M. E. (1994). The equivalence of Rasch item calibrations and ability estimates across modes of administration. In M. Wilson (Ed.), Objective measurement: Theory into practice, Vol. 2 (pp. 122-128). Norwood, New Jersey: Ablex.

Cano, S., Pendrill, L., Barbic, S., & Fisher, W. P., Jr. (2018). Patient-centred outcome metrology for healthcare decision-making. Journal of Physics: Conference Series, 1044, 012057.

Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct validation. Measurement & Evaluation in Counseling & Development, 43(2), 121-149.

Embretson, S. E. (2010). Measuring psychological constructs: Advances in model-based approaches. Washington, DC: American Psychological Association.

Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.

Fischer, G. H. (1983). Logistic latent trait models with linear constraints. Psychometrika, 48(1), 3-26.

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238 [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2008). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-1163 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., & Stenner, A. J. (2016). Theory-based metrological traceability in education: A reading measurement network. Measurement, 92, 489-496.

Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37(4), 827-833.

Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139-64.

Hobart, J. C., Cano, S. J., Zajicek, J. P., & Thompson, A. J. (2007). Rating scales as outcome measures for clinical trials in neurology: Problems, solutions, and recommendations. Lancet Neurology, 6, 1094-1105.

Irvine, S. H., Dunn, P. L., & Anderson, J. D. (1990). Towards a theory of algorithm-determined cognitive test construction. British Journal of Psychology, 81, 173-195.

Kline, T. L., Schmidt, K. M., & Bowles, R. P. (2006). Using LinLog and FACETS to model item components in the LLTM. Journal of Applied Measurement, 7(1), 74-91.

Lunz, M. E., & Linacre, J. M. (2010). Reliability of performance examinations: Revisited. In M. Garner, G. Engelhard, Jr., W. P. Fisher, Jr. & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 1 (pp. 328-341). Maple Grove, MN: JAM Press.

Mari, L., & Wilson, M. (2014). An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315-327.

Markward, N. J., & Fisher, W. P., Jr. (2004). Calibrating the genome. Journal of Applied Measurement, 5(2), 129-141.

Maul, A., Mari, L., Torres Irribarra, D., & Wilson, M. (2018). The quality of measurement results in terms of the structural features of the measurement process. Measurement, 116, 611-620.

Muthukrishna, M., & Henrich, J. (2019). A problem in theory. Nature Human Behaviour, 1-9.

Obiekwe, J. C. (1999, August 1). Application and validation of the linear logistic test model for item difficulty prediction in the context of mathematics problems. Dissertation Abstracts International: Section B: The Sciences & Engineering, 60(2-B), 0851.

Pendrill, L. (2014). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33.

Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55.

Pendrill, L., & Petersson, N. (2016). Metrology of human-based and other qualitative measurements. Measurement Science and Technology, 27(9), 094003.

Sijtsma, K. (2009). Correcting fallacies in validity, reliability, and classification. International Journal of Testing, 8(3), 167-194.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107-120.

Stenner, A. J. (2001). The necessity of construct theory. Rasch Measurement Transactions, 15(1), 804-5 [http://www.rasch.org/rmt/rmt151q.htm].

Stenner, A. J., Fisher, W. P., Jr., Stone, M. H., & Burdick, D. S. (2013). Causal Rasch models. Frontiers in Psychology: Quantitative Psychology and Measurement, 4(536), 1-14.

Stenner, A. J., & Horabin, I. (1992). Three stages of construct definition. Rasch Measurement Transactions, 6(3), 229 [http://www.rasch.org/rmt/rmt63b.htm].

Stenner, A. J., Stone, M. H., & Fisher, W. P., Jr. (2018). The unreasonable effectiveness of theory based instrument calibration in the natural sciences: What can the social sciences learn? Journal of Physics Conference Series, 1044(012070).

Stone, M. H. (2003). Substantive scale construction. Journal of Applied Measurement, 4(3), 282-297.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wilson, M. R. (2013). Using the concept of a measurement system to characterize measurement models used in psychometrics. Measurement, 46, 3766-3774.

Wright, B. D., & Stone, M. H. (1979). Chapter 5: Constructing a variable. In Best test design: Rasch measurement (pp. 83-128). Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/measess/me-all.pdf].

Wright, B. D., Stone, M., & Enos, M. (2000). The evolution of meaning in practice. Rasch Measurement Transactions, 14(1), 736 [http://www.rasch.org/rmt/rmt141g.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

Evaluating Questionnaires as Measuring Instruments

June 23, 2018

An email came in today asking whether three different short (4- and 5-item) questionnaires could be expected to provide reasonable quality measurement. Here’s my response.

—–

Thanks for raising this question. The questionnaire plainly was not designed to provide data suitable for measurement. Though much can be learned about making constructs measurable from data produced by this kind of questionnaire, “Rasch analysis” cannot magically create a silk purse from a sow’s ear (as the old expression goes). Use Linacre’s (1993) generalizability theory nomograph to see what reliabilities are expected for each subscale, given the numbers of items and rating categories, and applying a conservative estimate of the adjusted standard deviations (1.0 logit, for instance). Convert the reliability coefficients into strata (Fisher, 1992, 2008; Wright & Masters, 1982, pp. 92, 105-106) to make the practical meaning of the precision obtained obvious.

So if you have data, analyze it and compare the expected and observed reliabilities. If the uncertainties are quite different, is that because of targeting issues? But before you do that, ask experts in the area to rank order:

  • the courses by relevance to the job;
  • the evaluation criteria from easy to hard; and
  • the skills/competencies in order of importance to job performance.

Then study the correspondence between the rankings and the calibration results. Where do they converge and diverge? Why? What’s unexpected? What can be learned?

Analyze all of the items in each area (student, employer, instructor) together in Winsteps and study each of the three tables 23.x, setting PRCOMP=S. Remember that the total variance explained is not interpreted simply in terms of “more is better” and that the total variance explained is not as important as the ratio of that variance to the variance in the first contrast (see Linacre, 2006, 2008). If the ratio is greater than 3, the scale is essentially unidimensional (though significant problems may remain to be diagnosed and corrected).

Common practice holds that unexplained variance eigenvalues should be less than 1.5, but this overly simplistic rule of thumb (Chou & Wang, 2010; Raîche, 2005) has been contradicted in practice many times, since, even if one or more eigenvalues are over 1.5, theory may say the items belong to the same construct, and the disattenuated correlations of the measures implied by the separate groups of items (provided in tables 23.x) may still approach 1.00, indicating that the same measures are produced across subscales. See Green (1996) and Smith (1996), among others, for more on this.

If subscales within each of the three groups of items are markedly different in the measures they produce, then separate them in different analyses. If these further analyses reveal still more multidimensionalities, it’s time to go back to the drawing board, given how short these scales are. If you define a plausible scale, study the item difficulty orders closely with one or more experts in the area. If there is serious interest in precision measurement and its application to improved management, and not just a bureaucratic need for data to satisfy empty demands for a mere appearance of quality assessment, then trace the evolution of the construct as it changes from less to more across the items.

What, for instance, is the common theme addressed across the courses that makes them all relevant to job performance? The courses were each created with an intention and they were brought together into a curriculum for a purpose. These intentions and purposes are the raw material of a construct theory. Spell out the details of how the courses build competency in translation.

Furthermore, I imagine that this curriculum, by definition, was set up to be effective in training students no matter who is in the courses (within the constraints of the admission criteria), and no matter which particular challenges relevant to job performance are sampled from the universe of all possible challenges. You will recognize these unexamined and unarticulated assumptions as what need to be explicitly stated as hypotheses informing a model of the educational enterprise. This model transforms implicit assumptions into requirements that are never fully satisfied but can be very usefully approximated.

As I’ve been saying for a long time (Fisher, 1989), please do not accept the shorthand language of references to “the Rasch model”, “Rasch scaling”, “Rasch analysis”, etc. Rasch did not invent the form of these models, which are at least as old as Plato. And measurement is not a function of data analysis. Data provide experimental evidence testing model-based hypotheses concerning construct theories. When explanatory theory corroborates and validates data in calibrated instrumentation, the instrument can be applied at the point of use with no need for data analysis, to produce measures, uncertainty (error) estimates, and graphical fit assessments (Connolly, Nachtman, & Pritchett, 1971; Davis, et al., 2008; Fisher, 2006; Fisher, Kilgore, & Harvey, 1995; Linacre, 1997; many others).

So instead of using those common shorthand phrases, please speak directly to the problem of modeling the situation in order to produce a practical tool for managing it.

Further information is available in the references below.

 

Aryadoust, S. V. (2009). Mapping Rasch-based measurement onto the argument-based validity framework. Rasch Measurement Transactions, 23(1), 1192-3 [http://www.rasch.org/rmt/rmt231.pdf].

Chang, C.-H. (1996). Finding two dimensions in MMPI-2 depression. Structural Equation Modeling, 3(1), 41-49.

Chou, Y. T., & Wang, W. C. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70, 717-731.

Connolly, A. J., Nachtman, W., & Pritchett, E. M. (1971). Keymath: Diagnostic Arithmetic Test. Circle Pines, Minnesota: American Guidance Service. Retrieved 23 June 2018 from https://images.pearsonclinical.com/images/pa/products/keymath3_da/km3-da-pub-summary.pdf

Davis, A. M., Perruccio, A. V., Canizares, M., Tennant, A., Hawker, G. A., Conaghan, P. G. et al. (2008, May). The development of a short measure of physical function for hip OA HOOS-Physical Function Shortform (HOOS-PS): An OARSI/OMERACT initiative. Osteoarthritis Cartilage, 16(5), 551-559.

Fisher, W. P., Jr. (1989). What we have to offer. Rasch Measurement Transactions, 3(3), 72 [http://www.rasch.org/rmt/rmt33d.htm].

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238  [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2006). Survey design recommendations [expanded from Fisher, W. P. Jr. (2000) Popular Measurement, 3(1), pp. 58-59]. Rasch Measurement Transactions, 20(3), 1072-1074 [http://www.rasch.org/rmt/rmt203.pdf].

Fisher, W. P., Jr. (2008). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-1163 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5(1), 3-25.

Green, K. E. (1996). Dimensional analyses of complex data. Structural Equation Modeling, 3(1), 50-61.

Linacre, J. M. (1993). Rasch-based generalizability theory. Rasch Measurement Transactions, 7(1), 283-284; [http://www.rasch.org/rmt/rmt71h.htm].

Linacre, J. M. (1997). Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation State of the Art Reviews, 11(2), 315-324 [http://www.rasch.org/memo60.htm].

Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2(3), 266-83.

Linacre, J. M. (1998). Structure in Rasch residuals: Why principal components analysis? Rasch Measurement Transactions, 12(2), 636 [http://www.rasch.org/rmt/rmt122m.htm].

Linacre, J. M. (2003). PCA: Data variance: Explained, modeled and empirical. Rasch Measurement Transactions, 17(3), 942-943 [http://www.rasch.org/rmt/rmt173g.htm].

Linacre, J. M. (2006). Data variance explained by Rasch measures. Rasch Measurement Transactions, 20(1), 1045 [http://www.rasch.org/rmt/rmt201a.htm].

Linacre, J. M. (2008). PCA: Variance in data explained by Rasch measures. Rasch Measurement Transactions, 22(1), 1164 [http://www.rasch.org/rmt/rmt221j.htm].

Raîche, G. (2005). Critical eigenvalue sizes in standardized residual Principal Components Analysis. Rasch Measurement Transactions, 19(1), 1012 [http://www.rasch.org/rmt/rmt191h.htm].

Schumacker, R. E., & Linacre, J. M. (1996). Factor analysis and Rasch. Rasch Measurement Transactions, 9(4), 470 [http://www.rasch.org/rmt/rmt94k.htm].

Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3(2), 205-31.

Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3(1), 25-40.

Wright, B. D. (1996). Comparing Rasch measurement and factor analysis. Structural Equation Modeling, 3(1), 3-24.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, Illinois: MESA Press.

Excerpts and Notes from Goldberg’s “Billions of Drops…”

December 23, 2015

Goldberg, S. H. (2009). Billions of drops in millions of buckets: Why philanthropy doesn’t advance social progress. New York: Wiley.

p. 8:
Transaction costs: “…nonprofit financial markets are highly disorganized, with considerable duplication of effort, resource diversion, and processes that ‘take a fair amount of time to review grant applications and to make funding decisions’ [citing Harvard Business School Case No. 9-391-096, p. 7, Note on Starting a Nonprofit Venture, 11 Sept 1992]. It would be a major understatement to describe the resulting capital market as inefficient.”

A McKinsey study found that nonprofits spend 2.5 to 12 times more raising capital than for-profits do. When administrative costs are factored in, nonprofits spend 5.5 to 21.5 times more.

For-profit and nonprofit funding efforts contrasted on pages 8 and 9.

p. 10:
Balanced scorecard rating criteria

p. 11:
“Even at double-digit annual growth rates, it will take many years for social entrepreneurs and their funders to address even 10% of the populations in need.”

p. 12:
Exhibit 1.5 shows that the percentages of various needs served by leading social enterprises are barely drops in the respective buckets; they range from 0.07% to 3.30%.

pp. 14-16:
Nonprofit funding is not tied to performance. Even when a nonprofit makes the effort to show measured improvement in impact, it does little or nothing to change their funding picture. It appears that there is some kind of funding ceiling implicitly imposed by funders, since nonprofit growth and success seems to persuade capital sources that their work there is done. Mediocre and low performing nonprofits seem to be able to continue drawing funds indefinitely from sympathetic donors who don’t require evidence of effective use of their money.

p. 34:
“…meaningful reductions in poverty, illiteracy, violence, and hopelessness will require a fundamental restructuring of nonprofit capital markets. Such a restructuring would need to make it much easier for philanthropists of all stripes–large and small, public and private, institutional and individual–to fund nonprofit organizations that maximize social impact.”

p. 54:
Exhibit 2.3 is a chart showing that fewer people rose from poverty, and more remained in it or fell deeper into it, in the period of 1988-98 compared with 1969-1979.

pp. 70-71:
Kotter’s (1996) change cycle.

p. 75:
McKinsey’s seven elements of nonprofit capacity and capacity assessment grid.

pp. 94-95:
Exhibits 3.1 and 3.2 contrast the way financial markets reward for-profit performance with the way nonprofit markets reward fund raising efforts.

Financial markets
1. Market aggregates and disseminates standardized data
2. Analysts publish rigorous research reports
3. Investors proactively search for strong performers
4. Investors penalize weak performers
5. Market promotes performance
6. Strong performers grow

Nonprofit markets
1. Social performance is difficult to measure
2. NPOs don’t have resources or expertise to report results
3. Investors can’t get reliable or standardized results data
4. Strong and weak NPOs spend 40 to 60% of time fundraising
5. Market promotes fundraising
6. Investors can’t fund performance; NPOs can’t scale

p. 95:
“…nonprofits can’t possibly raise enough money to achieve transformative social impact within the constraints of the existing fundraising system. I submit that significant social progress cannot be achieved without what I’m going to call ‘third-stage funding,’ that is, funding that doesn’t suffer from disabling fragmentation. The existing nonprofit capital market is not capable of [p. 97] providing third-stage funding. Such funding can arise only when investors are sufficiently well informed to make big bets at understandable and manageable levels of risk. Existing nonprofit capital markets neither provide investors with the kinds of information needed–actionable information about nonprofit performance–nor provide the kinds of intermediation–active oversight by knowledgeable professionals–needed to mitigate risk. Absent third-stage funding, nonprofit capital will remain irreducibly fragmented, preventing the marshaling of resources that nonprofit organizations need to make meaningful and enduring progress against $100 million problems.”

pp. 99-114:
Text and diagrams on innovation, market adoption, transformative impact.

p. 140:
Exhibit 4.2: Capital distribution of nonprofits, highlighting mid-caps

pages 192-3 make the case for the difference between a regular market and the current state of philanthropic, social capital markets.

p. 192:
“So financial markets provide information investors can use to compare alternative investment opportunities based on their performance, and they provide a dynamic mechanism for moving money away from weak performers and toward strong performers. Just as water seeks its own level, markets continuously recalibrate prices until they achieve a roughly optimal equilibrium at which most companies receive the ‘right’ amount of investment. In this way, good companies thrive and bad ones improve or die.
“The social sector should work the same way. .. But philanthropic capital doesn’t flow toward effective nonprofits and away from ineffective nonprofits for a simple reason: contributors can’t tell the difference between the two. That is, philanthropists just don’t [p. 193] know what various nonprofits actually accomplish. Instead, they only know what nonprofits are trying to accomplish, and they only know that based on what the nonprofits themselves tell them.”

p. 193:
“The signs that the lack of social progress is linked to capital market dysfunctions are unmistakable: fundraising remains the number-one [p. 194] challenge of the sector despite the fact that nonprofit leaders divert some 40 to 60% of their time from productive work to chasing after money; donations raised are almost always too small, too short, and too restricted to enhance productive capacity; most mid-caps are ensnared in the ‘social entrepreneur’s trap’ of focusing on today and neglecting tomorrow; and so on. So any meaningful progress we could make in the direction of helping the nonprofit capital market allocate funds as effectively as the private capital market does could translate into tremendous advances in extending social and economic opportunity.
“Indeed, enhancing nonprofit capital allocation is likely to improve people’s lives much more than, say, further increasing the total amount of donations. Why? Because capital allocation has a multiplier effect.”

“If we want to materially improve the performance and increase the impact of the nonprofit sector, we need to understand what’s preventing [p. 195] it from doing a better job of allocating philanthropic capital. And figuring out why nonprofit capital markets don’t work very well requires us to understand why the financial markets do such a better job.”

p. 197:
“When all is said and done, securities prices are nothing more than convenient approximations that market participants accept as a way of simplifying their economic interactions, with a full understanding that market prices are useful even when they are way off the mark, as they so often are. In fact, that’s the whole point of markets: to aggregate the imperfect and incomplete knowledge held by vast numbers of traders about much various securities are worth and still make allocation choices that are better than we could without markets.
“Philanthropists face precisely the same problem: how to make better use of limited information to maximize output, in this case, social impact. Considering the dearth of useful tools available to donors today, the solution doesn’t have to be perfect or even all that good, at least at first. It just needs to improve the status quo and get better over time.
“Much of the solution, I believe, lies in finding useful adaptations of market mechanisms that will mitigate the effects of the same lack of reliable and comprehensive information about social sector performance. I would even go so far as to say that social enterprises can’t hope to realize their ‘one day, all children’ visions without a funding allociation system that acts more like a market.
“We can, and indeed do, make incremental improvements in nonprofit funding without market mechanisms. But without markets, I don’t see how we can fix the fragmentation problem or produce transformative social impact, such as ensuring that every child in America has a good education. The problems we face are too big and have too many moving parts to ignore the self-organizing dynamics of market economics. As Thomas Friedman said about the need to impose a carbon tax at a time of falling oil prices, ‘I’ve wracked my brain trying to think of ways to retool America around clean-power technologies without a price signal–i.e., a tax–and there are no effective ones.”

p. 199:
“Prices enable financial markets to work the way nonprofit capital markets should–by sending informative signals about the most effective organizations so that money will flow to them naturally..”

p. 200:
[Quotes Kurtzman citing De Soto on the mystery of capital. Also see p. 209, below.]
“‘Solve the mystery of capital and you solve many seemingly intractable problems along with it.'”
[That’s from page 69 in Kurtzman, 2002.]

p. 201:
[Goldberg says he’s quoting Daniel Yankelovich here, but the footnote does not appear to have anything to do with this quote:]
“‘The first step is to measure what can easily be measured. The second is to disregard what can’t be measured, or give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can’t be measured easily isn’t very important. This is blindness. The fourth step is to say that what can’t be easily measured really doesn’t exist. This is suicide.'”

Goldberg gives example here of $10,000 invested witha a 10% increase in value, compared with $10,000 put into a nonprofit. “But if the nonprofit makes good use of the money and, let’s say, brings the reading scores of 10 elementary school students up from below grade level to grade level, we can’t say how much my initial investment is ‘worth’ now. I could make the argument that the value has increased because the students have received a demonstrated educational benefit that is valuable to them. Since that’s the reason I made the donation, the achievement of higher scores must have value to me, as well.”

p. 202:
Goldberg wonders whether donations to nonprofits would be better conceived as purchases than investments.

p. 207:
Goldberg quotes Jon Gertner from the March 9, 2008, issue of the New York Times Magazine devoted to philanthropy:

“‘Why shouldn’t the world’s smartest capitalists be able to figure out more effective ways to give out money now? And why shouldn’t they want to make sure their philanthropy has significant social impact? If they can measure impact, couldn’t they get past the resistance that [Warren] Buffet highlighted and finally separate what works from what doesn’t?'”

p. 208:
“Once we abandon the false notions that financial markets are precision instruments for measuring unambiguous phenomena, and that the business and nonproft sectors are based in mutually exclusive principles of value, we can deconstruct the true nature of the problems we need to address and adapt market-like mechanisms that are suited to the particulars of the social sector.
“All of this is a long way (okay, a very long way) of saying that even ordinal rankings of nonprofit investments can have tremendous value in choosing among competing donation opportunities, especially when the choices are so numerous and varied. If I’m a social investor, I’d really like to know which nonprofits are likely to produce ‘more’ impact and which ones are likely to produce ‘less.'”

“It isn’t necessary to replicate the complex working of the modern stock markets to fashion an intelligent and useful nonprofit capital allocation mechanism. All we’re looking for is some kind of functional indication that would (1) isolate promising nonprofit investments from among the confusing swarm of too many seemingly worthy social-purpose organizations and (2) roughly differentiate among them based on the likelihood of ‘more’ or ‘less’ impact. This is what I meant earlier by increasing [p. 209] signals and decreasing noise.”

p. 209:
Goldberg apparently didn’t read De Soto, as he says that the mystery of capital is posed by Kurtzman and says it is solved via the collective intelligence and wisdom of crowds. This completely misses the point of the crucial value that transparent representations of structural invariance hold in market functionality. Goldberg is apparently offering a loose kind of market for which there is an aggregate index of stocks for nonprofits that are built up from their various ordinal performance measures. I think I find a better way in my work, building more closely from De Soto (Fisher, 2002, 2003, 2005, 2007, 2009a, 2009b).

p. 231:
Goldberg quotes Harvard’s Allen Grossman (1999) on the cost-benefit boundaries of more effective nonprofit capital allocation:

“‘Is there a significant downside risk in restructuring some portion of the philanthropic capital markets to test the effectiveness of performance driven philanthropy? The short answer is, ‘No.’ The current reality is that most broad-based solutions to social problems have eluded the conventional and fragmented approaches to philanthropy. It is hard to imagine that experiments to change the system to a more performance driven and rational market would negatively impact the effectiveness of the current funding flows–and could have dramatic upside potential.'”

p. 232:
Quotes Douglas Hubbard’s How to Measure Anything book that Stenner endorsed, and Linacre and I didn’t.

p. 233:
Cites Stevens on the four levels of measurement and uses it to justify his position concerning ordinal rankings, recognizing that “we can’t add or subtract ordinals.”

pp. 233-5:
Justifies ordinal measures via example of Google’s PageRank algorithm. [I could connect from here using Mary Garner’s (2009) comparison of PageRank with Rasch.]

p. 236:
Goldberg tries to justify the use of ordinal measures by citing their widespread use in social science and health care. He conveniently ignores the fact that virtually all of the same problems and criticisms that apply to philanthropic capital markets also apply in these areas. In not grasping the fundamental value of De Soto’s concept of transferable and transparent representations, and in knowing nothing of Rasch measurement, he was unable to properly evaluate to potential of ordinal data’s role in the formation of philanthropic capital markets. Ordinal measures aren’t just not good enough, they represent a dangerous diversion of resources that will be put into systems that take on lives of their own, creating a new layer of dysfunctional relationships that will be hard to overcome.

p. 261 [Goldberg shows here his complete ignorance about measurement. He is apparently totally unaware of the work that is in fact most relevant to his cause, going back to Thurstone in 1920s, Rasch in the 1950s-1970s, and Wright in the 1960s to 2000. Both of the problems he identifies have long since been solved in theory and in practice in a wide range of domains in education, psychology, health care, etc.]:
“Having first studied performance evaluation some 30 years ago, I feel confident in saying that all the foundational work has been done. There won’t be a ‘eureka!’ breakthrough where someone finally figures out the one true way to guage nonprofit effectiveness.
“Indeed, I would venture to say that we know virtually everything there is to know about measuring the performance of nonprofit organizations with only two exceptions: (1) How can we compare nonprofits with different missions or approaches, and (2) how can we make actionable performance assessments common practice for growth-ready mid-caps and readily available to all prospective donors?”

p. 263:
“Why would a social entrepreneur divert limited resources to impact assessment if there were no prospects it would increase funding? How could an investor who wanted to maximize the impact of her giving possibly put more golden eggs in fewer impact-producing baskets if she had no way to distinguish one basket from another? The result: there’s no performance data to attract growth capital, and there’s no growth capital to induce performance measurement. Until we fix that Catch-22, performance evaluation will not become an integral part of social enterprise.”

pp. 264-5:
Long quotation from Ken Berger at Charity Navigator on their ongoing efforts at developing an outcome measurement system. [wpf, 8 Nov 2009: I read the passage quoted by Goldberg in Berger’s blog when it came out and have been watching and waiting ever since for the new system. wpf, 8 Feb 2012: The new system has been online for some time but still does not include anything on impacts or outcomes. It has expanded from a sole focus on financials to also include accountability and transparency. But it does not yet address Goldberg’s concerns as there still is no way to tell what works from what doesn’t.]

p. 265:
“The failure of the social sector to coordinate independent assets and create a whole that exceeds the sum of its parts results from an absence of.. platform leadership’: ‘the ability of a company to drive innovation around a particular platform technology at the broad industry level.’ The object is to multiply value by working together: ‘the more people who use the platform products, the more incentives there are for complement producers to introduce more complementary products, causing a virtuous cycle.'” [Quotes here from Cusumano & Gawer (2002). The concept of platform leadership speaks directly to the system of issues raised by Miller & O’Leary (2007) that must be addressed to form effective HSN capital markets.]

p. 266:
“…the nonprofit sector has a great deal of both money and innovation, but too little available information about too many organizations. The result is capital fragmentation that squelches growth. None of the stakeholders has enough horsepower on its own to impose order on this chaos, but some kind of realignment could release all of that pent-up potential energy. While command-and-control authority is neither feasible nor desirable, the conditions are ripe for platform leadership.”

“It is doubtful that the IMPEX could amass all of the resources internally needed to build and grow a virtual nonprofit stock market that could connect large numbers of growth-capital investors with large numbers of [p. 267] growth-ready mid-caps. But it might be able to convene a powerful coalition of complementary actors that could achieve a critical mass of support for performance-based philanthropy. The challenge would be to develop an organization focused on filling the gaps rather than encroaching on the turf of established firms whose participation and innovation would be required to build a platform for nurturing growth of social enterprise..”

p. 268-9:
Intermediated nonprofit capital market shifts fundraising burden from grantees to intermediaries.

p. 271:
“The surging growth of national donor-advised funds, which simplify and reduce the transaction costs of methodical giving, exemplifies the kind of financial innovation that is poised to leverage market-based investment guidance.” [President of Schwab Charitable quoted as wanting to make charitable giving information- and results-driven.]

p. 272:
Rating agencies and organizations: Charity Navigator, Guidestar, Wise Giving Alliance.
Online donor rankings: GlobalGiving, GreatNonprofits, SocialMarkets
Evaluation consultants: Mathematica

Google’s mission statement: “to organize the world’s information and make it universally accessible and useful.”

p. 273:
Exhibit 9.4 Impact Index Whole Product
Image of stakeholders circling IMPEX:
Trading engine
Listed nonprofits
Data producers and aggregators
Trading community
Researchers and analysts
Investors and advisors
Government and business supporters

p. 275:
“That’s the starting point for replication [of social innovations that work]: finding and funding; matching money with performance.”

[WPF bottom line: Because Goldberg misses De Soto’s point about transparent representations resolving the mystery of capital, he is unable to see his way toward making the nonprofit capital markets function more like financial capital markets, with the difference being the focus on the growth of human, social, and natural capital. Though Goldberg intuits good points about the wisdom of crowds, he doesn’t know enough about the flaws of ordinal measurement relative to interval measurement, or about the relatively easy access to interval measures that can be had, to do the job.]

References

Cusumano, M. A., & Gawer, A. (2002, Spring). The elements of platform leadership. MIT Sloan Management Review, 43(3), 58.

De Soto, H. (2000). The mystery of capital: Why capitalism triumphs in the West and fails everywhere else. New York: Basic Books.

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2003). Measurement and communities of inquiry. Rasch Measurement Transactions, 17(3), 936-8 [http://www.rasch.org/rmt/rmt173.pdf].

Fisher, W. P., Jr. (2005). Daredevil barnstorming to the tipping point: New aspirations for the human sciences. Journal of Applied Measurement, 6(3), 173-9 [http://www.livingcapitalmetrics.com/images/FisherJAM05.pdf].

Fisher, W. P., Jr. (2007, Summer). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-3 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2009a). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In M. Wilson, K. Draney, N. Brown & B. Duckor (Eds.), Advances in Rasch Measurement, Vol. Two (p. in press [http://www.livingcapitalmetrics.com/images/BringingHSN_FisherARMII.pdf]). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2009b, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Garner, M. (2009, Autumn). Google’s PageRank algorithm and the Rasch measurement model. Rasch Measurement Transactions, 23(2), 1201-2 [http://www.rasch.org/rmt/rmt232.pdf].

Grossman, A. (1999). Philanthropic social capital markets: Performance driven philanthropy (Social Enterprise Series 12 No. 00-002). Harvard Business School Working Paper.

Kotter, J. (1996). Leading change. Cambridge, Massachusetts: Harvard Business School Press.

Kurtzman, J. (2002). How the markets really work. New York: Crown Business.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-34.

The Counterproductive Consequences of Common Study Designs and Statistical Methods

May 21, 2015

Because of the ways studies are designed and the ways data are analyzed, research results in psychology and the social sciences often appear to be nonlinear, sample- and instrument-dependent, and incommensurable, even when they need not be. In contrast with what are common assumptions about the nature of the constructs involved, invariant relations may be more obscured than clarified by typically employed research designs and statistical methods.

To take a particularly salient example, the number of small factors with Eigenvalues greater than 1.0 identified via factor analysis increases as the number of modes in a multi-modal distribution also increases, and the interpretation of results is further complicated by the fact that the number of factors identified decreases as sample size increases (Smith, 1996).

Similarly, variation in employment test validity across settings was established as a basic assumption by the 1970s, after 50 years of studies observing the situational specificity of results. But then Schmidt and Hunter (1977) identified sampling error, measurement error, and range restriction as major sources of what was only the appearance of incommensurable variation in employment test validity. In other words, for most of the 20th century, the identification of constructs and comparisons of results across studies were pointlessly confused by mixed populations, uncontrolled variation in reliability, and unnoted floor and/or ceiling effects. Though they do nothing to establish information systems deploying common languages structured by standard units of measurement (Feinstein, 1995), meta-analysis techniques are a step forward in equating effect sizes (Hunter & Schmidt, 2004).

Wright and Stone’s (1979) Best Test Design, in contrast, takes up each of these problems in an explicit way. Sampling error is addressed in that both the sample’s and the items’ representations of the same populations of persons and expressions of a construct are evaluated. The evaluation of reliability is foregrounded and clarified by taking advantage of the availability of individualized measurement uncertainty (error) estimates (following Andrich, 1982, presented at AERA in 1977). And range restriction becomes manageable in terms of equating and linking instruments measuring in different ranges of the same construct. As was demonstrated by Duncan (1985; Allerup, Bech, Loldrup, et al., 1994; Andrich & Styles, 1998), for instance, the restricted ranges of various studies assessing relationships between measures of attitudes and behaviors led to the mistaken conclusion that these were separate constructs. When the entire range of variation was explicitly modeled and studied, a consistent relationship was found.

Statistical and correlational methods have long histories of preventing the discovery, assessment, and practical application of invariant relations because they fail to test for invariant units of measurement, do not define standard metrics, never calibrate all instruments measuring the same thing in common units, and have no concept of formal measurement systems of interconnected instruments. Wider appreciation of the distinction between statistics and measurement (Duncan & Stenbeck, 1988; Fisher, 2010; Wilson, 2013a), and of the potential for metrological traceability we have within our reach (Fisher, 2009, 2012; Fisher & Stenner, 2013; Mari & Wilson, 2013; Pendrill, 2014; Pendrill & Fisher, 2015; Wilson, 2013b; Wilson, Mari, Maul, & Torres Irribarra, 2015), are demonstrably fundamental to the advancement of a wide range of fields.

References

Allerup, P., Bech, P., Loldrup, D., Alvarez, P., Banegil, T., Styles, I., & Tenenbaum, G. (1994). Psychiatric, business, and psychological applications of fundamental measurement models. International Journal of Educational Research, 21(6), 611-622.

Andrich, D. (1982). An index of person separation in Latent Trait Theory, the traditional KR-20 index, and the Guttman scale response pattern. Education Research and Perspectives, 9(1), 95-104 [http://www.rasch.org/erp7.htm].

Andrich, D., & Styles, I. M. (1998). The structural relationship between attitude and behavior statements from the unfolding perspective. Psychological Methods, 3(4), 454-469.

Duncan, O. D. (1985). Probability, disposition and the inconsistency of attitudes and behaviour. Synthese, 42, 21-34.

Duncan, O. D., & Stenbeck, M. (1988). Panels and cohorts: Design and model in the study of voting turnout. In C. C. Clogg (Ed.), Sociological Methodology 1988 (pp. 1-35). Washington, DC: American Sociological Association.

Feinstein, A. R. (1995). Meta-analysis: Statistical alchemy for the 21st century. Journal of Clinical Epidemiology, 48(1), 71-79.

Fisher, W. P., Jr. (2009). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P., Jr. (2010). Statistics and measurement: Clarifying the differences. Rasch Measurement Transactions, 23(4), 1229-1230.

Fisher, W. P., Jr. (2012, May/June). What the world needs now: A bold plan for new standards [Third place, 2011 NIST/SES World Standards Day paper competition]. Standards Engineering, 64(3), 1 & 3-5.

Fisher, W. P., Jr., & Stenner, A. J. (2013). Overcoming the invisibility of metrology: A reading measurement network for education and the social sciences. Journal of Physics: Conference Series, 459(012024), http://iopscience.iop.org/1742-6596/459/1/012024.

Hunter, J. E., & Schmidt, F. L. (Eds.). (2004). Methods of meta-analysis: Correcting error and bias in research findings. Thousand Oaks, CA: Sage.

Mari, L., & Wilson, M. (2013). A gentle introduction to Rasch measurement models for metrologists. Journal of Physics Conference Series, 459(1), http://iopscience.iop.org/1742-6596/459/1/012002/pdf/1742-6596_459_1_012002.pdf.

Pendrill, L. (2014). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33.

Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55. doi: http://dx.doi.org/10.1016/j.measurement.2015.04.010

Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62(5), 529-540.

Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3(1), 25-40.

Wilson, M. R. (2013a). Seeking a balance between the statistical and scientific elements in psychometrics. Psychometrika, 78(2), 211-236.

Wilson, M. R. (2013b). Using the concept of a measurement system to characterize measurement models used in psychometrics. Measurement, 46, 3766-3774.

Wilson, M., Mari, L., Maul, A., & Torres Irribarra, D. (2015). A comparison of measurement concepts across physical science and social science domains: Instrument design, calibration, and measurement. Journal of Physics: Conference Series, 588(012034), http://iopscience.iop.org/1742-6596/588/1/012034.

Wright, B. D., & Stone, M. H. (1979). Best test design: Rasch measurement. Chicago, Illinois: MESA Press.

The New Information Platform No One Sees Coming

December 6, 2012

I’d like to draw your attention to a fundamentally important area of disruptive innovations no one seems to see coming. The biggest thing rising in the world of science today that does not appear to be on anyone’s radar is measurement. Transformative potential beyond that of the Internet itself is available.

Realizing that potential will require an Intangible Assets Metric System. This system will connect together all the different ways any one thing is measured, bringing common languages for representing human, social, and economic value into play everywhere. We need these metrics on the front lines of education, health care, social services, and in human, reputation, and natural resource management, as well as in the economic models and financial spreadsheets informing policy, and in the scientific research conducted in dozens of fields.

All reading ability measures, for instance, should be transparently, inexpensively, and effortlessly expressed in a universally uniform metric, in the same way that standardized measures of weight and volume inform grocery store purchasing decisions. We have made starts at such systems for reading, writing, and math ability measures, and for health status, functionality, and chronic disease management measures. There oddly seems to be, however, little awareness of the full value that stands to be gained from uniform metrics in these areas, despite the overwhelming human, economic, and scientific value derived from standardized units in the existing economy. There has accordingly been virtually no leadership or investment in this area.

Measurement practice in business is woefully out of touch with the true paradigm shift that has been underway in psychometrics for years, even though the mantra “you manage what you measure” is repeated far and wide. In a fascinating twist, practically the only ones who notice the business world’s conceptual shortfall in measurement practice are the contrarians who observe that quantification can often be more of a distraction from management than the medium of its execution—but this is true only when measures are poorly conceived, designed, and implemented.

Demand for better measurement—measurement that reduces data volume not only with no loss of information but with the addition of otherwise unavailable interstitial information; that supports mass customized comparability for informed purchasing and quality improvement decisions; and that enables common product definitions for outcomes-based budgeting—is growing hand in hand with the spread of resilient, nimble, lean, and adaptive business models, and with the ongoing geometrical growth in data volume.

An even bigger source of demand for the features of advanced measurement is the increasing dependence of the economy on intangible assets, those forms of human, social, and natural capital that comprise 90% or more of the total capital under management. We will bring these now economically dead forms of capital to life by systematically standardizing representations of their quality and quantity. The Internet is the planetary nervous system through which basic information travels, and the Intangible Assets Metric System will be the global cerebrum, where higher order thinking takes place.

It will not be possible to realize the full potential of lean thinking in the information- and service-based economy without an Intangible Assets Metric System. Given the long-proven business value of standards and the role of measurement in management, it seems self-evident that our ongoing economic difficulties stem largely from our failure to develop and deploy an Intangible Assets Metric System providing common currencies for the exchange of authentic wealth. The future of sustainable and socially responsible business practices must surely depend extensively on universal access to flexible and practical uniform metrics for intangible assets.

Of course, for global intangible assets standards to be viable, they must be adaptable to local business demands and conditions without compromising their comparability. And that is just what is most powerfully disruptive about contemporary measurement methods: they make mass customization a reality. They’ve been doing so in computerized testing since the 1970s. Isn’t it time we started putting this technology to systematic use in a wide range of applications, from human and environmental resource management to education, health care, and social services?

Measuring/Managing Social Value

August 28, 2012

From my December 1, 2008 personal journal, written not long after the October 2008 SoCap conference. I’ve updated a few things that have changed in the intervening years.

Over the last month, I’ve been digesting what I learned at the Social Capital Markets conference at Fort Mason in San Francisco, and at the conference I attended just afterward, Bioneers, in Marin county. Bioneers (www.Bioneers.org) could be called Natural Capital Markets. It was quite like the Social Capital Markets conference with only a slight shift in emphasis, and lots of discussion of social value.

The main thing that impressed me at both of these conferences, apart from what I already knew about the caring passion I share with so many, is the huge contrast between that passion and the quality of the data that so many are basing major decisions on. Seeing this made me step back and think harder about how to shape my message.

First, though it may not seem like it initially, there is incredible practical value to be gained from taking the trouble to construct good measures. We do indeed manage what we measure. So whatever we measure becomes what we manage. If we’re not measuring anything that has anything to do with our mission, vision, or values, then what we’re managing won’t have anything to do with those, either. And when the numbers we use as measures do not actually represent a constant unit amount that adds up the way the numbers do, then we don’t have a clue what we’re measuring and we could be managing just about anything.

This is not the way to proceed. First take-away: ask for more from your data. Don’t let it mislead you with superficial appearances. Dig deeper.

Second, to put it a little differently, percentages, scores, and counts per capita, etc. are not measures that have the same meaning or quality that measures of height, weight, time, temperature, or volts have. However, for over 50 years, we have been constructing measures mathematically equivalent to physical measures from ability tests, surveys, assessments, checklists, etc. The technical literature on this is widely available. The methods have been mainstream at ETS, ACT, state and national departments of education globally, etc for decades.

Second take-away: did I say you should ask for more from your data? You can get it. A lot of people already are, though I don’t think they’re asking for nearly as much as they could get.

Third, though the massive numbers of percentages, scores, and counts per capita are not the measures we seek, they are indeed exactly the right place to start. I have seen over and over again, in education, health care, sociology, human resource management, and most recently in the UN Millennium Development Goals data, that people do know exactly what data will form a proper basis for the measurement systems they need.

Third take-away: (one more time!) ask for more from your data. It may conceal a wealth beyond what you ever guessed.

So what are we talking about? There are methods for creating measures that give you numbers that verifiably stand for a substantive unit amount that adds up in the same way one-inch blocks do (probabilistically, and within a range of error). If the instrument is properly calibrated and administered, the unit size and meaning will not change across individuals or samples measured. You can reduce data volume dramatically, not only with no loss of information but also with false appearances of information either indicated as error or flagged for further attention. You can calibrate a continuum of less to more that is reliably and reproducibly associated with, annotated by, and interpreted through your own indicators. You can equate different collections of indicators that measure the same thing so that they do so in the same unit.

Different agencies using the same, different, or mixed collections of indicators in different countries or regions could assess their measures for comparability, and if they are of satisfactory quality, equate them so they measure in the same unit. That is, well-designed instruments written and administered in different languages routinely have their items calibrate in the same order and positions, giving the same meaning to the same unit of measurement. For instance, see the recent issue of the Journal of Applied Measurement ([link]) devoted to reports on the OECD’s Programme for International Student Assessment.

This is not a data analysis strategy. It is an instrument calibration strategy. Once calibrated, the instrument can be deployed. We need to monitor its structure, but the point is to create a tool people can take out into the world and use like a thermometer or clock.

I’ve just been looking at the Charity Navigator (for instance, [link]) and the UN’s Millenium Development Goals ([link]), and the databases that have been assembled as measures of progress toward these goals ([link]). I would suppose these web sites show data in forms that people are generally familiar with, so I’m working up analyses to use as teaching tools from the UN data.

You don’t have to take any of this at my word. It’s been documented ad nauseum in the academic literature for decades. Those interested can find out more than they ever wanted to know at http://www.Rasch.org, in the Wikipedia Rasch entry, in the articles and books at JAMPress.com, or in dozens of academic journals and hundreds of books. Though I’ve done my share of it, I’m less interested in continuing to add to that than I am in making a tangible contribution to improving people’s lives.

Sorry to go on like this. I meant to keep this short. Anyway, there it is.

PS, for real geeks: For those of you serious about learning about measurement as it is rigorously and mathematically defined, look into taking Everett Smith’s measurement course at Statistics.com ([link]) or David Andrich’s academic units at the University of Western Australia ([link]). Available software includes Mike Linacre’s Winsteps, Andrich’s RUMM, and Mark Wilson’s, at UC Berkeley, Conquest.

The methods Ev, Mike, David, and Mark teach have repeatedly been proven, both in mathematical theory and in real life, to be both necessary and sufficient in the construction of meaningful, practical measurement. Any number of ways of defining objectivity in measurement have been shown to reduce to the mathematical models they use. Why all the Chicago stuff? Because of Ben Wright. I’m helping (again) to organize a conference in his honor, to be held in Chicago next March. His work won him a Career Achievement Award from the Association of Test Publishers, and the coming conference will celebrate his foundational contributions to computerized measurement in health care.

As a final note, for those of you fearing reductionistic meaninglessness, look into my philosophical work.  But enough…

Review of “Advancing Social Impact Investments Through Measurement”

August 24, 2012

Over the last few days, I have been reading several of the most recent issues of the Community Development Investment Review, especially volume 7, number 2, edited by David Erickson of the Federal Reserve Bank of San Francisco, reporting the proceedings of the March 21, 2011 conference in Washington, DC on advancing social impact investments through measurement. I am so excited to see this work that I am (truly) fairly trembling with excitement. I feel as though I’ve finally made my way home. There are so many points of contact, it’s hard to know where to start. After several days of concentrated deep breathing and close study of the CDIR, it’s now possible to formulate some coherent thoughts to share.

The CDIR papers start to sort out the complex issues involved in clarifying how measurement might contribute to the integration of impact investing and community development finance. I am heartened by the statement that “The goal of the Review is to bridge the gap between theory and practice and to enlist as many viewpoints as possible—government, nonprofits, financial institutions, and beneficiaries.” On the other hand, the omission of measurement scientists from that list of viewpoints adds another question to my long list of questions as to why measurement science is so routinely ignored by the very people who proclaim its importance. The situation is quite analogous to demanding more frequent conversational interactions from colleagues while ignoring the invention of the telephone and not providing them with the tools and network connections.

The aims shared by the CDIR contributors and myself are evident in the fact that David Erickson opens his summary of the March 21, 2011 conference with the same quote from Robert Kennedy that I placed at the end of my 2009 article in Measurement (see references below; all papers referenced are available by request if they are not already online). In that 2009 paper, in others I’ve published over the last several years, in presentations I’ve made to my measurement colleagues abroad and at home, and in various entries in my blog, I take up virtually all of the major themes that arose in the DC conference: how better measurement can attract capital to needed areas, how the cost of measurement repels many investors, how government can help by means of standard setting and regulation, how diverse and ambiguous investor and stakeholder interests can be reconciled and/or clarified, etc.

The difference, of course, is that I present these issues from the technical perspective of measurement and cannot speak authoritatively or specifically from the perspectives represented by the community development finance and impact investing fields. The bottom line take-away message for these fields from my perspective is this: unexamined assumptions may unnecessarily restrict assessments of problems and their potential solutions. As Salamon put it in his remarks in the CDIR proceedings from the Washington meeting (p. 43), “uncoordinated innovation not guided by a clear strategic concept can do more than lose its way: it can do actual harm.”

A clear strategic concept capable of coordinating innovations in social impact measurement is readily available. Multiple, highly valuable, and eminently practical measurement technologies have proven themselves in real world applications over the last 50 years. These technologies are well documented in the educational, psychological, sociological, and health care research literatures, as well as in the practical experience of high stakes testing for professional licensure and certification, for graduation, and for admissions.

Numerous reports show how to approach problems of quantification and standards with new degrees of rigor, transparency, meaningfulness, and flexibility. When measurement problems are not defined in terms of these technologies, solutions that may offer highly advantageous features are not considered. When the area of application is as far reaching and fundamental as social impact measurement, not taking new technologies into account is nothing short of tragic. I describe some of the new opportunities for you in a Technical Postscript, below.

In his Foreword to the CDIR proceedings issue, John Moon mentions having been at the 2009 SoCap event bringing together stakeholders from across the various social capital markets arenas. I was at the 2008 SoCap, and I came away from it with much the same impression as Moon, feeling that the palpable excitement in the air was more than tempered by the evident fact that people were often speaking at cross purposes, and that there did not seem to be a common object to the conversation. Moon, Erickson, and their colleagues have been in one position to sort out the issues involved, and I have been in another, but we are plainly on converging courses.

Though the science is in place and has been for decades, it will not and cannot amount to anything until the people who can best make use of it do so. The community development finance and impact investing fields are those people. Anyone interested in getting together for an informal conversation on topics of mutual interest should feel free to contact me.

Technical Postscript

There are at least six areas in efforts to advance social impact investments via measurement that will be most affected by contemporary methods. The first has to do with scale quality. I won’t go into the technical details, but numbers do not automatically stand for something that adds up the way they do. Mapping a substantive construct onto a number line requires specific technical expertise; there is no evidence of that expertise in any of the literature I’ve seen on social impact investing, or on measuring intangible assets. This is not an arbitrary bit of philosophical esoterica or technical nicety. This is one of those areas where the practical value of scientific rigor and precision comes into its own. It makes all the difference in being able to realize goals for measurement, investment, and redefining profit in terms of social impacts.

A second area in which thinking on social impact measurement will be profoundly altered by current scaling methods concerns the capacity to reduce data volume with no loss of information. In current systems, each indicator has its own separate metric. Data volume quickly multiplies when tracking separate organizations for each of several time periods in various locales. Given sufficient adherence to data quality and meaningfulness requirements, today’s scaling methods allow these indicators to be combined into a single composite measure—from which each individual observation can be inferred.

Elaborating this second point a bit further, I noted that some speakers at the 2011 conference in Washington thought reducing data volume is a matter of limiting the number of indicators that are tracked. This strategy is self-defeating, however, as having fewer independent observations increases uncertainty and risk. It would be far better to set up systems in which the metrics are designed so as to incorporate the amount of uncertainty that can be tolerated in any given decision support application.

The third area I have in mind deals with the diverse spectrum of varying interests and preferences brought to the table by investors, beneficiaries, and other stakeholders. Contemporary approaches in measurement make it possible to adapt the content of the particular indicators (counts or frequencies of events, or responses to survey questions or test items) to the needs of the user, without compromising the comparability of the resulting quantitative measure. This feature makes it possible to mass customize the content of the metrics employed depending on the substantive nature of the needs at that time and place.

Fourth, it is well known that different people judging performances or assigning numbers to observations bring different personal standards to bear as they make their ratings. Contemporary measurement methods enable the evaluation and scaling of raters and judges relative to one another, when data are gathered in a manner facilitating such comparisons. The end result is a basis for fair comparisons, instead of scores that vary depending more on which rater is observing than on the quality of the performance.

Fifth, much of the discussion at the conference in Washington last year emphasized the need for shared data formatting and reporting standards. As might be guessed from the prior four areas I’ve described, significant advances have occurred in standard setting methods. It is suggested in the CDIR proceedings that the Treasury Department should be the home to a new institute for social impact measurement standards. In a series of publications over the last few years, I have suggested a need for an Intangible Assets Metric System to NIST and NSF (see below for references and links; all papers are available on request). That suggestion comes up again in my third-prize winning entry in the 2011 World Standards Day paper competition, sponsored by NIST and SES (the Society for Standards Professionals), entitled “What the World Needs Now: A Bold Plan for New Standards.” (See below for link.)

Sixth, as noted by Salamon (p. 43), “metrics are not neutral. They not only measure impact, they can also shape it.” Though this is not likely exactly what Salamon meant, one of the most exciting areas in measurement applications in education in recent years, one led in many ways by my colleague, Mark Wilson, and his group at UC Berkeley, concerns exactly this feedback loop between measurement and impact. In education, it has become apparent that test scaling reveals the order in which lessons are learned. Difficult problems that require mastery of easier problems are necessarily answered correctly less often than the easier problems. When the difficulty order of test questions in a given subject remains constant over time and across thousands of students, one may infer that the scale reveals the path of least resistance. Individualizing instruction by targeting lessons at the student’s measure has given rise to a concept of formative assessment, distinct from the summative assessment of accountability applications. I suspect this kind of a distinction may also prove of value in social impact applications.

Relevant Publications and Presentations

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2004, Thursday, January 22). Bringing capital to life via measurement: A contribution to the new economics. In  R. Smith (Chair), Session 3.3B. Rasch Models in Economics and Marketing. Second International Conference on Measurement in Health, Education, Psychology, and Marketing: Developments with Rasch Models, The International Laboratory for Measurement in the Social Sciences, School of Education, Murdoch University, Perth, Western Australia.

Fisher, W. P., Jr. (2005, August 1-3). Data standards for living human, social, and natural capital. In Session G: Concluding Discussion, Future Plans, Policy, etc. Conference on Entrepreneurship and Human Rights [http://www.fordham.edu/economics/vinod/ehr05.htm], Pope Auditorium, Lowenstein Bldg, Fordham University.

Fisher, W. P., Jr. (2007, Summer). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-3 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2008, 3-5 September). New metrological horizons: Invariant reference standards for instruments measuring human, social, and natural capital. Presented at the 12th International Measurement Confederation (IMEKO) TC1-TC7 Joint Symposium on Man, Science, and Measurement, Annecy, France: University of Savoie.

Fisher, W. P., Jr. (2009, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P.. Jr. (2009). NIST Critical national need idea White Paper: Metrological infrastructure for human, social, and natural capital (Tech. Rep., http://www.nist.gov/tip/wp/pswp/upload/202_metrological_infrastructure_for_human_social_natural.pdf). Washington, DC: National Institute for Standards and Technology.

Fisher, W. P., Jr. (2010). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics: Conference Series, 238(1), http://iopscience.iop.org/1742-6596/238/1/012016/pdf/1742-6596_238_1_012016.pdf.

Fisher, W. P., Jr. (2011). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In N. Brown, B. Duckor, K. Draney & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 2 (pp. 1-27). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2011). Measuring genuine progress by scaling economic indicators to think global & act local: An example from the UN Millennium Development Goals project. LivingCapitalMetrics.com. Retrieved 18 January 2011, from Social Science Research Network: http://ssrn.com/abstract=1739386.

Fisher, W. P., Jr. (2012). Measure and manage: Intangible assets metric standards for sustainability. In J. Marques, S. Dhiman & S. Holt (Eds.), Business administration education: Changes in management and leadership strategies (pp. 43-63). New York: Palgrave Macmillan.

Fisher, W. P., Jr. (2012, May/June). What the world needs now: A bold plan for new standards. Standards Engineering, 64(3), 1 & 3-5 [http://ssrn.com/abstract=2083975].

Fisher, W. P., Jr., & Stenner, A. J. (2011, January). Metrology for the social, behavioral, and economic sciences (Social, Behavioral, and Economic Sciences White Paper Series). Retrieved 25 October 2011, from National Science Foundation: http://www.nsf.gov/sbe/sbe_2020/submission_detail.cfm?upld_id=36.

Fisher, W. P., Jr., & Stenner, A. J. (2011, August 31 to September 2). A technology roadmap for intangible assets metrology. In Fundamentals of measurement science. International Measurement Confederation (IMEKO) TC1-TC7-TC13 Joint Symposium, http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24493/ilm1-2011imeko-018.pdf, Jena, Germany.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

HEY GREECE!!! One more time through the basics

May 10, 2012

As the battle between austerity and growth mindsets threatens to freeze into a brittle gridlock, it seems time once again to simplify and repeat some painfully obvious observations.

1. Human, social, and natural capital make up at least 90 percent of the capital under management in the global economy.

2. There is no system of uniform weights and measures for these forms of capital.

3. We manage what we measure; so, lacking proper measures for 90 percent of the capital in the economy, we cannot possibly manage it properly.

4. Measurement theory and practice have advanced to the point that the technical viability of a meaningful, objective, and precise system of uniform units for human, social, and natural capital is no longer an issue.

5. A metric system for intangible assets (human, social, and natural capital) is the infrastructural capacity building project capable of supporting sustainable and responsible growth we are looking for.

6. Individual citizens, philanthropists, entrepreneurs, corporations, NGOs, educators, health care advocates, innovators, researchers, and governments everywhere ought to be focusing intensely on building systems of consensus measures that take full advantage of existing technical means for instrument scaling, equating, adaptive administration, mass customization, growth modeling, data quality assessment, and diagnostic individualized reporting.

7. Uniform impact measurement will make it possible to price outcomes in ways that allow market forces to inform consumers as to where they can obtain the best cost/value relation for the money. In other words, the profit motive will be directly harnessed in growing human, social, and natural capital.

8. Happiness indexes and gross national or domestic authentic wealth products will not obtain any real practical utility until individuals, firms, NGOs, and governments can directly manage their own intangible asset bottom lines.

See other posts in this blog or the links below for more information.

William P. Fisher, Jr., Ph.D.

Research Associate
BEAR Center
Graduate School of Education
University of California, Berkeley
Principal
LivingCapitalMetrics Consulting

We are what we measure.

It’s time we measured what we want to be.

Connect with me on LinkedIn: http://www.linkedin.com/in/livingcapitalmetrics
View my research on my SSRN Author page: http://ssrn.com/author=1090685
Read my blog at https://livingcapitalmetrics.wordpress.com.
See my web site at http://www.livingcapitalmetrics.com.
http://www.rasch.org
Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Comments on the New ANSI Human Capital Investor Metrics Standard

April 16, 2012

The full text of the proposed standard is available here.

It’s good to see a document emerge in this area, especially one with such a broad base of support from a diverse range of stakeholders. As is stated in the standard, the metrics defined in it are a good place to start and in many instances will likely improve the quality and quantity of the information made available to investors.

There are several issues to keep in mind as the value of standards for human capital metrics becomes more widely appreciated. First, in the context of a comprehensively defined investment framework, human capital is just one of the four major forms of capital, the other three being social, natural, and manufactured (Ekins, 1992; Ekins, Dresden, and Dahlstrom, 2008). To ensure as far as possible the long term stability and sustainability of their profits, and of the economic system as a whole, investors will certainly want to expand the range of the available standards to include social and natural capital along with human capital.

Second, though we manage what we measure, investment management is seriously compromised by having high quality scientific measurement standards only for manufactured capital (length, weight, volume, temperature, energy, time, kilowatts, etc.). Over 80 years of research on ability tests, surveys, rating scales, and assessments has reached a place from which it is prepared to revolutionize the management of intangible forms of capital (Fisher, 2007, 2009a, 2009b, 2010, 2011a, 2011b; Fisher & Stenner, 2011a, 2011b; Wilson, 2011; Wright, 1999). The very large reductions in transaction costs effected by standardized metrics in the economy at large (Barzel, 1982; Benham and Benham, 2000) are likely to have a similarly profound effect on the economics of human, social, and natural capital (Fisher, 2011a, 2012a, 2012b).

The potential for dramatic change in the conceptualization of metrics is most evident in the proposed standard in the sections on leadership quality and employee engagement. For instance, in the section on leadership quality, it is stated that “Investors will be able to directly compare all organizations that are using the same vendor’s methodology.” This kind of dependency should not be allowed to stand as a significant factor in a measurement standard. Properly constructed and validated scientific measures, such as those that have been in wide use in education, psychology and health care for several decades (Andrich, 2010; Bezruzcko, 2005; Bond and Fox, 2007; Fisher and Wright, 1994; Rasch, 1960; Salzberger, 2009; Wright, 1999), are equated to a common unit. Comparability should never depend on which vendor is used. Rather, any instrument that actually measures the construct of interest (leadership quality or employee engagement) should do so in a common unit and within an acceptable range of error. “Normalizing” measures for comparability, as is suggested in the standard, means employing psychometric methods that are 50 years out of date and that are far less rigorous and practical than need be. Transparency in measurement means looking through the instrument to the thing itself. If particular instruments color or reshape what is measured, or merely change the meaning of the numbers reported, then the integrity of the standard as a standard should be re-examined.

Third, for investments in human capital to be effectively managed, each distinct aspect of it (motivations, skills and abilities, health) needs to be measured separately, just as height, weight, and temperature are. New technologies have already transformed measurement practices in ways that make the necessary processes precise and inexpensive. Of special interest are adaptively administered precalibrated instruments supporting mass customized—but globally comparable—measures (for instance, see the examples at http://blog.lexile.com/tag/oasis/ and that were presented at the recent Pearson Global Research Conference in Fremantle, Australia http://www.pearson.com.au/marketing/corporate/pearson_global/default.html; also see Wright and Bell 1984, Lunz, Bergstrom, and Gershon, 1994, Bejar, et al., 2003).

Fourth, the ownership of human capital needs clarification and legal status. If we consider each individual to own their abilities, health, and motivations, and to be solely responsible for decisions made concerning the disposition of those properties, then, in accord with their proven measured amounts of each type of human capital, everyone ought to have legal title to a specific number of shares or credits of each type. This may transform employment away from wage-based job classification compensation to an individualized investment-based continuous quality improvement platform. The same kind of legal titling system will, of course, need to be worked out for social and natural capital, as well.

Fifth, given scientific standards for each major form of capital, practical measurement technologies, and legal title to our shares of capital, we will need expanded financial accounting standards and tools for managing our individual and collective investments. Ongoing research and debates concerning these standards and tools (Siegel and Borgia, 2006; Young and Williams, 2010) have yet to connect with the larger scientific, economic, and legal issues raised here, but developments in this direction should be emerging in due course.

Sixth, a number of lingering moral, ethical and political questions are cast in a new light in this context. The significance of individual behaviors and decisions is informed and largely determined by the context of the culture and institutions in which those behaviors and decisions are executed. Many of the morally despicable but not illegal investment decisions leading to the recent economic downturn put individuals in the position of either setting themselves apart and threatening their careers or doing what was best for their portfolios within the limits of the law. Current efforts intended to devise new regulatory constraints are misguided in focusing on ever more microscopically defined particulars. What is needed is instead a system in which profits are contingent on the growth of human, social, and natural capital. In that framework, legal but ultimately unfair practices would drive down social capital stock values, counterbalancing ill-gotten gains and making them unprofitable.

Seventh, the International Vocabulary of Measurement, now in its third edition (VIM3), is a standard recognized by all eight international standards accrediting bodies (BIPM, etc.). The VIM3 (http://www.bipm.org/en/publications/guides/vim.html) and forthcoming VIM4 are intended to provide a uniform set of concepts and terms for all fields that employ measures across the natural and social sciences. A new dialogue on these issues has commenced in the context of the International Measurement Confederation (IMEKO), whose member organizations are the weights and standards measurement institutes from countries around the world (Conference note, 2011). The 2012 President of the Psychometric Society, Mark Wilson, gave an invited address at the September 2011 IMEKO meeting (Wilson, 2011), and a member of the VIM3 editorial board, Luca Mari, is invited to speak at the July, 2012 International Meeting of the Psychometric Society. I encourage all interested parties to become involved in efforts of these kinds in their own fields.

References

Andrich, D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika, 75(2), 292-308.

Barzel, Y. (1982). Measurement costs and the organization of markets. Journal of Law and Economics, 25, 27-48.

Bejar, I., Lawless, R. R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2003, November). A feasibility study of on-the-fly item generation in adaptive testing. The Journal of Technology, Learning, and Assessment, 2(3), 1-29; http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1663.

Benham, A., & Benham, L. (2000). Measuring the costs of exchange. In C. Ménard (Ed.), Institutions, contracts and organizations: Perspectives from new institutional economics (pp. 367-375). Cheltenham, UK: Edward Elgar.

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Conference note. (2011). IMEKO Symposium: August 31- September 2, 2011, Jena, Germany. Rasch Measurement Transactions, 25(1), 1318.

Ekins, P. (1992). A four-capital model of wealth creation. In P. Ekins & M. Max-Neef (Eds.), Real-life economics: Understanding wealth creation (pp. 147-155). London: Routledge.

Ekins, P., Dresner, S., & Dahlstrom, K. (2008). The four-capital method of sustainable development evaluation. European Environment, 18(2), 63-80.

Fisher, W. P., Jr. (2007). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-3 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2009a). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P.. Jr. (2009b). NIST Critical national need idea White Paper: metrological infrastructure for human, social, and natural capital (http://www.nist.gov/tip/wp/pswp/upload/202_metrological_infrastructure_for_human_social_natural.pdf). Washington, DC: National Institute for Standards and Technology.

Fisher, W. P.. Jr. (2010). Rasch, Maxwell’s method of analogy, and the Chicago tradition. In G. Cooper (Chair), https://conference.cbs.dk/index.php/rasch/Rasch2010/paper/view/824. Probabilistic models for measurement in education, psychology, social science and health: Celebrating 50 years since the publication of Rasch’s Probabilistic Models.., University of Copenhagen School of Business, FUHU Conference Centre, Copenhagen, Denmark.

Fisher, W. P., Jr. (2011a). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In N. Brown, B. Duckor, K. Draney & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 2 (pp. 1-27). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2011b). Measurement, metrology and the coordination of sociotechnical networks. In  S. Bercea (Chair), New Education and Training Methods. International Measurement Confederation (IMEKO), http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24491/ilm1-2011imeko-017.pdf, Jena, Germany.

Fisher, W. P., Jr. (2012a). Measure local, manage global: Intangible assets metric standards for sustainability. In J. Marques, S. Dhiman & S. Holt (Eds.), Business administration education: Changes in management and leadership strategies (pp. in press). New York: Palgrave Macmillan.

Fisher, W. P., Jr. (2012b). What the world needs now: A bold plan for new standards. Standards Engineering, 64, in press.

Fisher, W. P., Jr., & Stenner, A. J. (2011a). Metrology for the social, behavioral, and economic sciences (Social, Behavioral, and Economic Sciences White Paper Series). Retrieved 25 October 2011, from National Science Foundation: http://www.nsf.gov/sbe/sbe_2020/submission_detail.cfm?upld_id=36.

Fisher, W. P., Jr., & Stenner, A. J. (2011b). A technology roadmap for intangible assets metrology. In Fundamentals of measurement science. International Measurement Confederation (IMEKO) TC1-TC7-TC13 Joint Symposium, http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24493/ilm1-2011imeko-018.pdf, Jena, Germany.

Fisher, W. P., Jr., & Wright, B. D. (Eds.). (1994). Applications of probabilistic conjoint measurement. International Journal of Educational Research, 21(6), 557-664.

Lunz, M. E., Bergstrom, B. A., & Gershon, R. C. (1994). Computer adaptive testing. International Journal of Educational Research, 21(6), 623-634.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Salzberger, T. (2009). Measurement in marketing research: An alternative framework. Northampton, MA: Edward Elgar.

Siegel, P., & Borgia, C. (2006). The measurement and recognition of intangible assets. Journal of Business and Public Affairs, 1(1).

Wilson, M. (2011). The role of mathematical models in measurement: A perspective from psychometrics. In L. Mari (Chair), Plenary lecture. International Measurement Confederation (IMEKO), http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24178/ilm1-2011imeko-005.pdf, Jena, Germany.

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D., & Bell, S. R. (1984, Winter). Item banks: What, why, how. Journal of Educational Measurement, 21(4), 331-345 [http://www.rasch.org/memo43.htm].

Young, J. J., & Williams, P. F. (2010, August). Sorting and comparing: Standard-setting and “ethical” categories. Critical Perspectives on Accounting, 21(6), 509-521.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Rasch Measurement as a Basis for a New Standards Framework

October 26, 2011

The 2011 U.S. celebration of World Standards Day took place on October 13 at the Fairmont Hotel in Washington, D.C., with the theme of “Advancing Safety and Sustainability Standards Worldwide.” The evening began with a reception in a hall of exhibits from the celebrations sponsors, which included the National Institute for Standards and Technology (NIST), the Society for Standards Professionals (SES), the American National Standards Institute (ANSI), Microsoft, IEEE, Underwriters Laboratories, the Consumer Electronics Association, ASME, ASTM International, Qualcomm, Techstreet, and many others. Several speakers took the podium after dinner to welcome the 400 or so attendees and to present the World Standards Day Paper Competition Awards and the Ronald H. Brown Standards Leadership Award.

Dr. Patrick Gallagher, Under Secretary of Commerce for Standards and Technology, and Director of NIST, was the first speaker after dinner. He directed his remarks at the value of a decentralized, voluntary, and demand-driven system of standards in promoting innovation and economic prosperity. Gallagher emphasized that “standards provide the common language that keeps domestic and international trade flowing,” concluding that “it is difficult to overestimate their critical value to both the U.S. and global economy.”

James Shannon, President of the National Fire Protection Association (NFPA), accepted the R. H. Brown Standards Leadership Award in recognition for his work initiating or improving the National Electrical Code, the Life Safety Code, and the Fire Safe Cigarette and Residential Sprinkler Campaigns.

Ellen Emard, President of SES, introduced the paper competition award winners. As of this writing the titles and authors of the first and second place awards are not yet available on the SES web site (http://www.ses-standards.org/displaycommon.cfm?an=1&subarticlenbr=56). I took third place for my paper, “What the World Needs Now: A Bold Plan for New Standards.” Where the other winning papers took up traditional engineering issues concerning the role of standards in advancing safety and sustainability issues, my paper spoke to the potential scientific and economic benefits that could be realized by standard metrics and common product definitions for outcomes in education, health care, social services, and environmental resource management. All three of the award-winning papers will appear in a forthcoming issue of Standards Engineering, the journal of SES.

I was coincidentally seated at the dinner alongside Gordon Gillerman, winner of third place in the 2004 paper competition (http://www.ses-standards.org/associations/3698/files/WSD%202004%20-%203%20-%20Gillerman.pdf) and currently Chief of the Standards Services Division at NIST. Gillerman has a broad range of experience in coordinating standards across multiple domains, including environmental protection, homeland security, safety, and health care. Having recently been involved in a workshop focused on measuring, evaluating, and improving the usability of electronic health records (http://www.nist.gov/healthcare/usability/upload/EHR-Usability-Workshop-2011-6-03-2011_final.pdf), Gillerman was quite interested in the potential Rasch measurement techniques hold for reducing data volume with no loss of information, and so for streamlining computer interfaces.

Robert Massof of Johns Hopkins University accompanied me to the dinner, and was seated at a nearby table. Also at Massof’s table were several representatives of the National Institute of Building Sciences, some of whom Massof had recently met at a workshop on adaptations for persons with low vision disabilities. Massof’s work equating the main instruments used for assessing visual function in low vision rehabilitation could lead to a standard metric useful in improving the safety and convenience of buildings.

As is stated in educational materials distributed at the World Standards Day celebration by ANSI, standards are a constant behind-the-scenes presence in nearly all areas of everyday life. Everything from air, water, and food to buildings, clothing, automobiles, roads, and electricity are produced in conformity with voluntary consensus standards of various kinds. In the U.S. alone, more than 100,000 standards specify product and system features and interconnections, making it possible for appliances to tap the electrical grid with the same results no matter where they are plugged in, and for products of all kinds to be purchased with confidence. Life is safer and more convenient, and science and industry are more innovative and profitable, because of standards.

The point of my third-place paper is that life could be even safer and more convenient, and science and industry could be yet more innovative and profitable, if standards and conformity assessment procedures for outcomes in education, health care, social services, and environmental resource management were developed and implemented. Rasch measurement demonstrates the consistent reproducibility of meaningful measures across samples and different collections of construct-relevant items. Within any specific area of interest, then, Rasch measures have the potential of serving as the kind of mediating instruments or objects recognized as essential to the process of linking science with the economy (Fisher & Stenner, 2011b; Hussenot & Missonier, 2010; Miller & O’Leary, 2007). Recent white papers published by NIST and NSF document the challenges and benefits likely to be encountered and produced by initiatives moving in this direction (Fisher, 2009; Fisher & Stenner, 2011a).

A diverse array of Rasch measurement presentations were made at the recent International Measurement Confederation (IMEKO) meeting of metrology engineers in Jena, Germany (see RMT 25 (1), p. 1318). With that start at a new dialogue between the natural and social sciences, the NIST and NSF white papers, and with the award in the World Standards Day paper competition, the U.S. and international standards development communities have shown their interest in exploring possibilities for a new array of standard units of measurement, standardized outcome product definitions, standard conformity assessment procedures, and outcome product quality standards. The increasing acceptance and recognition of the viability of such standards is a logical consequence of observations like these:

  • “Where this law [relating reading ability and text difficulty to comprehension rate] can be applied it provides a principle of measurement on a ratio scale of both stimulus parameters and object parameters, the conceptual status of which is comparable to that of measuring mass and force. Thus…the reading accuracy of a child…can be measured with the same kind of objectivity as we may tell its weight” (Rasch, 1960, p. 115).
  • “Today there is no methodological reason why social science cannot become as stable, as reproducible, and hence as useful as physics” (Wright, 1997, p. 44).
  • “…when the key features of a statistical model relevant to the analysis of social science data are the same as those of the laws of physics, then those features are difficult to ignore” (Andrich, 1988, p. 22).

Rasch’s work has been wrongly assimilated in social science research practice as just another example of the “standard model” of statistical analysis. Rasch measurement rightly ought instead to be treated as a general articulation of the three-variable structure of natural law useful in framing the context of scientific practice. That is, Rasch’s models ought to be employed primarily in calibrating instruments quantitatively interpretable at the point of use in a mathematical language shared by a community of research and practice. To be shared in this way as a universally uniform coin of the realm, that language must be embodied in a consensus standard defining universally uniform units of comparison.

Rasch measurement offers the potential of shifting the focus of quantitative psychosocial research away from data analysis to integrated qualitative and quantitative methods enabling the definition of standard units and the calibration of instruments measuring in that unit. An intangible assets metric system will, in turn, support the emergence of new product- and performance-based standards, management system standards, and personnel certification standards. Reiterating once again Rasch’s (1960, p. xx) insight, we can acknowledge with him that “this is a huge challenge, but once the problem has been formulated it does seem possible to meet it.”

 References

Andrich, D. (1988). Rasch models for measurement. (Vols. series no. 07-068). Sage University Paper Series on Quantitative Applications in the Social Sciences. Beverly Hills, California: Sage Publications.

Fisher, W. P.. Jr. (2009). Metrological infrastructure for human, social, and natural capital (NIST Critical National Need Idea White Paper Series, Retrieved 25 October 2011 from http://www.nist.gov/tip/wp/pswp/upload/202_metrological_infrastructure_for_human_social_natural.pdf). Washington, DC: National Institute for Standards and Technology.

Fisher, W. P., Jr., & Stenner, A. J. (2011a, January). Metrology for the social, behavioral, and economic sciences (Social, Behavioral, and Economic Sciences White Paper Series). Retrieved 25 October 2011 from http://www.nsf.gov/sbe/sbe_2020/submission_detail.cfm?upld_id=36. Washington, DC: National Science Foundation.

Fisher, W. P., Jr., & Stenner, A. J. (2011b). A technology roadmap for intangible assets metrology. In Fundamentals of measurement science. International Measurement Confederation (IMEKO), Jena, Germany, August 31 to September 2.

Hussenot, A., & Missonier, S. (2010). A deeper understanding of evolution of the role of the object in organizational process. The concept of ‘mediation object.’ Journal of Organizational Change Management, 23(3), 269-286.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-34.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.