Archive for January, 2010

How Evidence-Based Decision Making Suffers in the Absence of Theory and Instrument: The Power of a More Balanced Approach

January 28, 2010

The Basis of Evidence in Theory and Instrument

The ostensible point of basing decisions in evidence is to have reasons for proceeding in one direction versus any other. We want to be able to say why we are proceeding as we are. When we give evidence-based reasons for our decisions, we typically couch them in terms of what worked in past experience. That experience might have been accrued over time in practical applications, or it might have been deliberately arranged in one or more experimental comparisons and tests of concisely stated hypotheses.

At its best, generalizing from past experience to as yet unmet future experiences enables us to navigate life and succeed in ways that would not be possible if we could not learn and had no memories. The application of a lesson learned from particular past events to particular future events involves a very specific inferential process. To be able to recognize repeated iterations of the same things requires the accumulation of patterns of evidence. Experience in observing such patterns allows us to develop confidence in our understanding of what that pattern represents in terms of pleasant or painful consequences. When we are able to conceptualize and articulate an idea of a pattern, and when we are then able to recognize a new occurrence of that pattern, we have an idea of it.

Evidence-based decision making is then a matter of formulating expectations from repeatedly demonstrated and routinely reproducible patterns of observations that lend themselves to conceptual representations, as ideas expressed in words. Linguistic and cultural frameworks selectively focus attention by projecting expectations and filtering observations into meaningful patterns represented by words, numbers, and other symbols. The point of efforts aimed at basing decisions in evidence is to try to go with the flow of this inferential process more deliberately and effectively than might otherwise be the case.

None of this is new or controversial. However, the inferential step from evidence to decision always involves unexamined and unjustified assumptions. That is, there is always an element of metaphysical faith behind the expectation that any given symbol or word is going to work as a representation of something in the same way that it has in the past. We can never completely eliminate this leap of faith, since we cannot predict the future with 100% confidence. We can, however, do a lot to reduce the size of the leap, and the risks that go with it, by questioning our assumptions in experimental research that tests hypotheses as to the invariant stability and predictive utility of the representations we make.

Theoretical and Instrumental Assumptions Hidden Behind the Evidence

For instance, evidence as to the effectiveness of an intervention or treatment is often expressed in terms of measures commonly described as quantitative. But it is unusual for any evidence to be produced justifying that description in terms of something that really adds up in the way numbers do. So we often find ourselves in situations in which our evidence is much less meaningful, reliable, and valid than we suppose it to be.

Quantitative measures are often valued as the hallmark of rational science. But their capacity to live up to this billing depends on the quality of the inferences that can be supported. Very few researchers thoroughly investigate the quality of their measures and justify the inferences they make relative to that quality.

Measurement presumes a reproducible pattern of evidence that can serve as the basis for a decision concerning how much of something has been observed. It naturally follows that we often base measurement in counts of some kind—successes, failures, ratings, frequencies, etc. The counts, scores, or sums are then often transformed into percentages by dividing them into the maximum possible that could be obtained. Sometimes the scores are averaged for each person measured, and/or for each item or question on the test, assessment, or survey. These scores and percentages are then almost universally fed directly into decision processes or statistical analyses with no further consideration.

The reproducible pattern of evidence on which decisions are based is presumed to exist between the measures, not within them. In other words, the focus is on the group or population statistics, not on the individual measures. Attention is typically focused on the tip of the iceberg, the score or percentage, not on the much larger, but hidden, mass of information beneath it. Evidence is presumed to be sufficient to the task when the differences between groups of scores are of a consistent size or magnitude, but is this sufficient?

Going Past Assumptions to Testable Hypotheses

In other words, does not science require that evidence be explained by theory, and embodied in instrumentation that provides a shared medium of observation? As shown in the blue lines in the Figure below,

  • theory, whether or not it is explicitly articulated, inevitably influences both what counts as valid data and the configuration of the medium of its representation, the instrument;
  • data, whether or not it is systematically gathered and evaluated, inevitably influences both the medium of its representation, the instrument, and the implicit or explicit theory that explains its properties and justifies its applications; and
  • instruments, whether or not they are actually calibrated from a mapping of symbols and substantive amounts, inevitably influence data gathering and the image of the object explained by theory.

The rhetoric of evidence-based decision making skips over the roles of theory and instrumentation, drawing a direct line from data to decision. In leaving theory laxly formulated, we allow any story that makes a bit of sense and is communicated by someone with a bit of charm or power to carry the day. In not requiring calibrated instrumentation, we allow any data that cross the threshold into our awareness to serve as an acceptable basis for decisions.

What we want, however, is to require meaningful measures that really provide the evidence needed for instruments that exhibit invariant calibrations and for theories that provide predictive explanatory control over the variable. As shown in the Figure, we want data that push theory away from the instrument, theory that separates the data and instrument, and instruments that get in between the theory and data.

We all know to distrust too close a correspondence between theory and data, but we too rarely understand or capitalize on the role of the instrument in mediating the theory-data relation. Similarly, when the questions used as a medium for making observations are obviously biased to produce responses conforming overly closely with a predetermined result, we see that the theory and the instrument are too close for the data to serve as an effective mediator.

Finally, the situation predominating in the social sciences is one in which both construct and measurement theories are nearly nonexistent, which leaves data completely dependent on the instrument it came from. In other words, because counts of correct answers or sums of ratings are mistakenly treated as measures, instruments fully determine and restrict the range of measurement to that defined by the numbers of items and rating categories. Once the instrument is put in play, changes to it would make new data incommensurable with old, so, to retain at least the appearance of comparability, the data structure then fully determines and restricts the instrument.

What we want, though, is a situation in which construct and measurement theories work together to make the data autonomous of the particular instrument it came from. We want a theory that explains what is measured well enough for us to be able to modify existing instruments, or create entirely new ones, that give the same measures for the same amounts as the old instruments. We want to be able to predict item calibrations from the properties of the items, we want to obtain the same item calibrations across data sets, and we want to be able to predict measures on the basis of the observed responses (data) no matter which items or instrument was used to produce them.

Most importantly, we want a theory and practice of measurement that allows us to take missing data into account by providing us with the structural invariances we need as media for predicting the future from the past. As Ben Wright (1997, p. 34) said, any data analysis method that requires complete data to produce results disqualifies itself automatically as a viable basis for inference because we never have complete data—any practical system of measurement has to be positioned so as to be ready to receive, process, and incorporate all of the data we have yet to gather. This goal is accomplished to varying degrees in Rasch measurement (Rasch, 1960; Burdick, Stone, & Stenner, 2006; Dawson, 2004). Stenner and colleagues (Stenner, Burdick, Sanford, & Burdick, 2006) provide a trajectory of increasing degrees to which predictive theory is employed in contemporary measurement practice.

The explanatory and predictive power of theory is embodied in instruments that focus attention on recording observations of salient phenomena. These observations become data that inform the calibration of instruments, which then are used to gather further data that can be used in practical applications and in checks on the calibrations and the theory.

“Nothing is so practical as a good theory” (Lewin, 1951, p. 169). Good theory makes it possible to create symbolic representations of things that are easy to think with. To facilitate clear thinking, our words, numbers, and instruments must be transparent. We have to be able to look right through them at the thing itself, with no concern as to distortions introduced by the instrument, the sample, the observer, the time, the place, etc. This happens only when the structure of the instrument corresponds with invariant features of the world. And where words effect this transparency to an extent, it is realized most completely when we can measure in ways that repeatedly give the same results for the same amounts in the same conditions no matter which instrument, sample, operator, etc. is involved.

Where Might Full Mathematization Lead?

The attainment of mathematical transparency in measurement is remarkable for the way it focuses attention and constrains the imagination. It is essential to appreciate the context in which this focusing occurs, as popular opinion is at odds with historical research in this regard. Over the last 60 years, historians of science have come to vigorously challenge the widespread assumption that technology is a product of experimentation and/or theory (Kuhn, 1961/1977; Latour, 1987, 2005; Maas, 2001; Mendelsohn, 1992; Rabkin, 1992; Schaffer, 1992; Heilbron, 1993; Hankins & Silverman, 1999; Baird, 2002). Neither theory nor experiment typically advances until a key technology is widely available to end users in applied and/or research contexts. Rabkin (1992) documents multiple roles played by instruments in the professionalization of scientific fields. Thus, “it is not just a clever historical aphorism, but a general truth, that ‘thermodynamics owes much more to the steam engine than ever the steam engine owed to thermodynamics’” (Price, 1986, p. 240).

The prior existence of the relevant technology comes to bear on theory and experiment again in the common, but mistaken, assumption that measures are made and experimentally compared in order to discover scientific laws. History shows that measures are rarely made until the relevant law is effectively embodied in an instrument (Kuhn, 1961/1977, pp. 218-9): “…historically the arrow of causality is largely from the technology to the science” (Price, 1986, p. 240). Instruments do not provide just measures; rather they produce the phenomenon itself in a way that can be controlled, varied, played with, and learned from (Heilbron, 1993, p. 3; Hankins & Silverman, 1999; Rabkin, 1992). The term “technoscience” has emerged as an expression denoting recognition of this priority of the instrument (Baird, 1997; Ihde & Selinger, 2003; Latour, 1987).

Because technology often dictates what, if any, phenomena can be consistently produced, it constrains experimentation and theorizing by focusing attention selectively on reproducible, potentially interpretable effects, even when those effects are not well understood (Ackermann, 1985; Daston & Galison, 1992; Ihde, 1998; Hankins & Silverman, 1999; Maasen & Weingart, 2001). Criteria for theory choice in this context stem from competing explanatory frameworks’ experimental capacities to facilitate instrument improvements, prediction of experimental results, and gains in the efficiency with which a phenomenon is produced.

In this context, the relatively recent introduction of measurement models requiring additive, invariant parameterizations (Rasch, 1960) provokes speculation as to the effect on the human sciences that might be wrought by the widespread availability of consistently reproducible effects expressed in common quantitative languages. Paraphrasing Price’s comment on steam engines and thermodynamics, might it one day be said that as yet unforeseeable advances in reading theory will owe far more to the Lexile analyzer (Stenner, et al., 2006) than ever the Lexile analyzer owed reading theory?

Kuhn (1961/1977) speculated that the second scientific revolution of the early- to mid-nineteenth century followed in large part from the full mathematization of physics, i.e., the emergence of metrology as a professional discipline focused on providing universally accessible, theoretically predictable, and evidence-supported uniform units of measurement (Roche, 1998). Kuhn (1961/1977, p. 220) specifically suggests that a number of vitally important developments converged about 1840 (also see Hacking, 1983, p. 234). This was the year in which the metric system was formally instituted in France after 50 years of development (it had already been obligatory in other nations for 20 years at that point), and metrology emerged as a professional discipline (Alder, 2002, p. 328, 330; Heilbron, 1993, p. 274; Kula, 1986, p. 263). Daston (1992) independently suggests that the concept of objectivity came of age in the period from 1821 to 1856, and gives examples illustrating the way in which the emergence of strong theory, shared metric standards, and experimental data converged in a context of particular social mores to winnow out unsubstantiated and unsupportable ideas and contentions.

Might a similar revolution and new advances in the human sciences follow from the introduction of evidence-based, theoretically predictive, instrumentally mediated, and mathematical uniform measures? We won’t know until we try.

Figure. The Dialectical Interactions and Mutual Mediations of Theory, Data, and Instruments

Figure. The Dialectical Interactions and Mutual Mediations of Theory, Data, and Instruments

Acknowledgment. These ideas have been drawn in part from long consideration of many works in the history and philosophy of science, primarily Ackermann (1985), Ihde (1991), and various works of Martin Heidegger, as well as key works in measurement theory and practice. A few obvious points of departure are listed in the references.

References

Ackermann, J. R. (1985). Data, instruments, and theory: A dialectical approach to understanding science. Princeton, New Jersey: Princeton University Press.

Alder, K. (2002). The measure of all things: The seven-year odyssey and hidden error that transformed the world. New York: The Free Press.

Aldrich, J. (1989). Autonomy. Oxford Economic Papers, 41, 15-34.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Baird, D. (1997, Spring-Summer). Scientific instrument making, epistemology, and the conflict between gift and commodity economics. Techné: Journal of the Society for Philosophy and Technology, 3-4, 25-46. Retrieved 08/28/2009, from http://scholar.lib.vt.edu/ejournals/SPT/v2n3n4/baird.html.

Baird, D. (2002, Winter). Thing knowledge – function and truth. Techné: Journal of the Society for Philosophy and Technology, 6(2). Retrieved 19/08/2003, from http://scholar.lib.vt.edu/ejournals/SPT/v6n2/baird.html.

Burdick, D. S., Stone, M. H., & Stenner, A. J. (2006). The Combined Gas Law and a Rasch Reading Law. Rasch Measurement Transactions, 20(2), 1059-60 [http://www.rasch.org/rmt/rmt202.pdf].

Carroll-Burke, P. (2001). Tools, instruments and engines: Getting a handle on the specificity of engine science. Social Studies of Science, 31(4), 593-625.

Daston, L. (1992). Baconian facts, academic civility, and the prehistory of objectivity. Annals of Scholarship, 8, 337-363. (Rpt. in L. Daston, (Ed.). (1994). Rethinking objectivity (pp. 37-64). Durham, North Carolina: Duke University Press.)

Daston, L., & Galison, P. (1992, Fall). The image of objectivity. Representations, 40, 81-128.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York, New York: Routledge.

Hacking, I. (1983). Representing and intervening: Introductory topics in the philosophy of natural science. Cambridge: Cambridge University Press.

Hankins, T. L., & Silverman, R. J. (1999). Instruments and the imagination. Princeton, New Jersey: Princeton University Press.

Heelan, P. A. (1983, June). Natural science as a hermeneutic of instrumentation. Philosophy of Science, 50, 181-204.

Heelan, P. A. (1998, June). The scope of hermeneutics in natural science. Studies in History and Philosophy of Science Part A, 29(2), 273-98.

Heidegger, M. (1977). Modern science, metaphysics, and mathematics. In D. F. Krell (Ed.), Basic writings [reprinted from M. Heidegger, What is a thing? South Bend, Regnery, 1967, pp. 66-108] (pp. 243-282). New York: Harper & Row.

Heidegger, M. (1977). The question concerning technology. In D. F. Krell (Ed.), Basic writings (pp. 283-317). New York: Harper & Row.

Heilbron, J. L. (1993). Weighing imponderables and other quantitative science around 1800. Historical studies in the physical and biological sciences), 24(Supplement), Part I, pp. 1-337.

Hessenbruch, A. (2000). Calibration and work in the X-ray economy, 1896-1928. Social Studies of Science, 30(3), 397-420.

Ihde, D. (1983). The historical and ontological priority of technology over science. In D. Ihde, Existential technics (pp. 25-46). Albany, New York: State University of New York Press.

Ihde, D. (1991). Instrumental realism: The interface between philosophy of science and philosophy of technology. (The Indiana Series in the Philosophy of Technology). Bloomington, Indiana: Indiana University Press.

Ihde, D. (1998). Expanding hermeneutics: Visualism in science. Northwestern University Studies in Phenomenology and Existential Philosophy). Evanston, Illinois: Northwestern University Press.

Ihde, D., & Selinger, E. (Eds.). (2003). Chasing technoscience: Matrix for materiality. (Indiana Series in Philosophy of Technology). Bloomington, Indiana: Indiana University Press.

Kuhn, T. S. (1961/1977). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. In T. S. Kuhn, The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago: University of Chicago Press, 1977).

Kula, W. (1986). Measures and men (R. Screter, Trans.). Princeton, New Jersey: Princeton University Press (Original work published 1970).

Lapre, M. A., & Van Wassenhove, L. N. (2002, October). Learning across lines: The secret to more efficient factories. Harvard Business Review, 80(10), 107-11.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York, New York: Cambridge University Press.

Latour, B. (2005). Reassembling the social: An introduction to Actor-Network-Theory. (Clarendon Lectures in Management Studies). Oxford, England: Oxford University Press.

Lewin, K. (1951). Field theory in social science: Selected theoretical papers (D. Cartwright, Ed.). New York: Harper & Row.

Maas, H. (2001). An instrument can make a science: Jevons’s balancing acts in economics. In M. S. Morgan & J. Klein (Eds.), The age of economic measurement (pp. 277-302). Durham, North Carolina: Duke University Press.

Maasen, S., & Weingart, P. (2001). Metaphors and the dynamics of knowledge. (Vol. 26. Routledge Studies in Social and Political Thought). London: Routledge.

Mendelsohn, E. (1992). The social locus of scientific instruments. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 5-22). Bellingham, WA: SPIE Optical Engineering Press.

Polanyi, M. (1964/1946). Science, faith and society. Chicago: University of Chicago Press.

Price, D. J. d. S. (1986). Of sealing wax and string. In Little Science, Big Science–and Beyond (pp. 237-253). New York, New York: Columbia University Press.

Rabkin, Y. M. (1992). Rediscovering the instrument: Research, industry, and education. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 57-82). Bellingham, Washington: SPIE Optical Engineering Press.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roche, J. (1998). The mathematics of measurement: A critical history. London: The Athlone Press.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Thurstone, L. L. (1959). The measurement of values. Chicago: University of Chicago Press, Midway Reprint Series.

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

Questions about measurement: If it is so important, why…?

January 28, 2010

If measurement is so important, why is measurement quality so uniformly low?

If we manage what we measure, why is measurement leadership virtually nonexistent?

If we can’t tell if things are getting better, staying the same, or getting worse without good metrics, why is measurement so rarely context-sensitive, focused, integrated, and interactive, as Dean Spitzer recommends it should be?

If quantification is valued for its rigor and convenience, why is no one demanding meaningful mappings of substantive, additive amounts of things measured on number lines?

If everyone is drowning in unmanageable floods of data why isn’t measurement used to reduce data volumes dramatically—and not only with no loss of information but with the addition of otherwise unavailable forms of information?

If learning and improvement are the order of the day, why isn’t anyone interested in the organizational and individual learning trajectories that are defined by hierarchies of calibrated items?

If resilient lean thinking is the way to go, why aren’t more measures constructed to retain their meaning and values across changes in item content?

If flexibility is a core value, why aren’t we adapting instruments to people and organizations, instead of vice versa?

If fair, just, and meaningful measurement is often lacking in judge-assigned performance assessments, why isn’t anyone estimating the consistency, and the leniency or harshness, of ratings—and removing those effects from the measures made?

If efficiency is valued, why does no one at all seem to care about adjusting measurement precision to the needs of the task at hand, so that time and resources are not wasted in gathering too much or too little data?

If it’s common knowledge that we can do more together than we can as individuals, why isn’t anyone providing the high quality and uniform information needed for the networked collective thinking that is able to keep pace with the demand for innovation?

Since the metric system and uniform product standards are widely recognized as essential to science and commerce, why are longstanding capacities for common metrics for human, social, and natural capital not being used?

If efficient markets are such great things, why isn’t anyone at all concerned about lubricating the flow of human, social, and natural capital by investing in the highest quality measurement obtainable?

If everyone loves a good profit, why aren’t we setting up human, social, and natural capital metric systems to inform competitive pricing of intangible assets, products, and services?

If companies are supposed to be organic entities that mature in a manner akin to human development over the lifespan, why is so little being done to conceive, gestate, midwife, and nurture living capital?

In short, if measurement is really as essential to management as it is so often said to be, why doesn’t anyone seek out the state of the art technology, methods, and experts before going to the trouble of developing and implementing metrics?

I suspect the answers to these questions are all the same. These disconnects between word and deed happen because so few people are aware of the technical advances made in measurement theory and practice over the last several decades.

For the deep background, see previous entries in this blog, various web sites (www.rasch.org, www.rummlab.com, www.winsteps.com, http://bearcenter.berkeley.edu/, etc.), and an extensive body of published work (Rasch, 1960; Wright, 1977, 1997a, 1997b, 1999a, 1999b; Andrich, 1988, 2004, 2005; Bond & Fox, 2007; Fisher, 2009, 2010; Smith & Smith, 2004; Wilson, 2005; Wright & Stone, 1999, 2004).

There is a wealth of published applied research in education, psychology, and health care (Bezruczko, 2005; Fisher & Wright, 1994; Masters, 2007; Masters & Keeves, 1999). To find more search Rasch and the substantive area of interest.

For applications in business contexts, there is a more limited number of published resources (ATP, 2001; Drehmer, Belohlav, & Coye, 2000; Drehmer & Deklava, 2001; Ludlow & Lunz, 1998; Lunz & Linacre, 1998; Mohamed, et al., 2008; Salzberger, 2000; Salzberger & Sinkovics, 2006; Zakaria, et al., 2008). I have, however, just become aware of the November, 2009, publication of what could be a landmark business measurement text (Salzberger, 2009). Hopefully, this book will be just one of many to come, and the questions I’ve raised will no longer need to be asked.

References

Andrich, D. (1988). Rasch models for measurement. (Vols. series no. 07-068). Sage University Paper Series on Quantitative Applications in the Social Sciences). Beverly Hills, California: Sage Publications.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Andrich, D. (2005). Georg Rasch: Mathematician and statistician. In K. Kempf-Leonard (Ed.), Encyclopedia of Social Measurement (Vol. 3, pp. 299-306). Amsterdam: Academic Press, Inc.

Association of Test Publishers. (2001, Fall). Benjamin D. Wright, Ph.D. honored with the Career Achievement Award in Computer-Based Testing. Test Publisher, 8(2). Retrieved 20 May 2009, from http://www.testpublishers.org/newsletter7.htm#Wright.

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Dawson, T. L., & Gabrielian, S. (2003, June). Developing conceptions of authority and contract across the life-span: Two perspectives. Developmental Review, 23(2), 162-218.

Drehmer, D. E., Belohlav, J. A., & Coye, R. W. (2000, Dec). A exploration of employee participation using a scaling approach. Group & Organization Management, 25(4), 397-418.

Drehmer, D. E., & Deklava, S. M. (2001, April). A note on the evolution of software engineering practices. Journal of Systems and Software, 57(1), 1-7.

Fisher, W. P., Jr. (2009, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Fisher, W. P., Jr. (2010). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 11, in press [Pre-press version available at http://www.livingcapitalmetrics.com/images/BringingHSN_FisherARMII.pdf].

Ludlow, L. H., & Lunz, M. E. (1998). The Job Responsibilities Scale: Invariance in a longitudinal prospective study. Journal of Outcome Measurement, 2(4), 326-37.

Lunz, M. E., & Linacre, J. M. (1998). Measurement designs using multifacet Rasch modeling. In G. A. Marcoulides (Ed.), Modern methods for business research. Methodology for business and management (pp. 47-77). Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Masters, G. N. (2007). Special issue: Programme for International Student Assessment (PISA). Journal of Applied Measurement, 8(3), 235-335.

Masters, G. N., & Keeves, J. P. (Eds.). (1999). Advances in measurement in educational research and assessment. New York: Pergamon.

Mohamed, A., Aziz, A., Zakaria, S., & Masodi, M. S. (2008). Appraisal of course learning outcomes using Rasch measurement: A case study in information technology education. In L. Kazovsky, P. Borne, N. Mastorakis, A. Kuri-Morales & I. Sakellaris (Eds.), Proceedings of the 7th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems (Electrical And Computer Engineering Series) (pp. 222-238). Cambridge, UK: WSEAS.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Salzberger, T. (2000). An extended Rasch analysis of the CETSCALE – implications for scale development and data construction., Department of Marketing, University of Economics and Business Administration, Vienna (WU-Wien) (http://www2.wu-wien.ac.at/marketing/user/salzberger/research/wp_dataconstruction.pdf).

Salzberger, T. (2009). Measurement in marketing research: An alternative framework. Northampton, MA: Edward Elgar.

Salzberger, T., & Sinkovics, R. R. (2006). Reconsidering the problem of data equivalence in international marketing research: Contrasting approaches based on CFA and the Rasch model for measurement. International Marketing Review, 23(4), 390-417.

Smith, E. V., Jr., & Smith, R. M. (2004). Introduction to Rasch measurement. Maple Grove, MN: JAM Press.35.

Spitzer, D. (2007). Transforming performance measurement: Rethinking the way we measure and drive organizational success. New York: AMACOM.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D. (1997a, June). Fundamental measurement for outcome evaluation. Physical Medicine & Rehabilitation State of the Art Reviews, 11(2), 261-88.

Wright, B. D. (1997b, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Wright, B. D. (1999a). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D. (1999b). Rasch measurement models. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 85-97). New York: Pergamon.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/memos.htm#measess].

Wright, B. D., & Stone, M. H. (2004). Making measures. Chicago: Phaneron Press.

Zakaria, S., Aziz, A. A., Mohamed, A., Arshad, N. H., Ghulman, H. A., & Masodi, M. S. (2008, November 11-13). Assessment of information managers’ competency using Rasch measurement. iccit: Third International Conference on Convergence and Hybrid Information Technology, 1, 190-196 [http://www.computer.org/portal/web/csdl/doi/10.1109/ICCIT.2008.387].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Review of Spitzer’s Transforming Performance Measurement

January 25, 2010

Everyone interested in practical measurement applications needs to read Dean R. Spitzer’s 2007 book, Transforming performance measurement: Rethinking the way we measure and drive organizational success (New York, AMACOM). Spitzer describes how measurement, properly understood and implemented, can transform organizational performance by empowering and motivating individuals. Measurement understood in this way moves beyond quick fixes and fads to sustainable processes based on a measurement infrastructure that coordinates decisions and actions uniformly throughout the organization.

Measurement leadership, Spitzer says, is essential. He advocates, and many organizations have instituted, the C-suite position of Chief Measurement Officer (Chapter 9). This person is responsible for instituting and managing the four keys to transformational performance measurement (Chapters 5-8):

  • Context sets the tone by presenting the purpose of measurement as either negative (to inspect, control, report, manipulate) or positive (to give feedback, learn, improve).
  • Focus concentrates attention on what’s important, aligning measures with the mission, strategy, and with what needs to be managed, relative to the opportunities, capacities, and skills at hand.
  • Integration addresses the flow of measured information throughout the organization so that the covariations of different measures can be observed relative to the overall value created.
  • Interactivity speaks to the inherently social nature of the purposes of measurement, so that it embodies an alignment with the business model, strategy, and operational imperatives.

Spitzer takes a developmental approach to measurement improvement, providing a Measurement Maturity Assessment in Chapter 12, and also speaking to the issues of the “living company” raised by Arie de Geus’ classic book of that title. Plainly, the transformative potential of performance measurement is dependent on the maturational complexity of the context in which it is implemented.

Spitzer clearly outlines the ways in which each of the four keys and measurement leadership play into or hinder transformation and maturation. He also provides practical action plans and detailed guidelines, stresses the essential need for an experimental attitude toward evaluating change, speaks directly to the difficulty of measuring intangible assets like partnership, trust, skills, etc., and shows appreciation for the value of qualitative data.

Transforming Performance Measurement is not an academic treatise, though all sources are documented, with the endnotes and bibliography running to 25 pages. It was written for executives, managers, and entrepreneurs who need practical advice expressed in direct, simple terms. Further, the book does not include any awareness of the technical capacities of measurement as these have been realized in numerous commercial applications in high stakes and licensure/certification testing over the last 50 years (Andrich, 2005; Bezruczko, 2005; Bond & Fox, 2007; Masters, 2007; Wilson, 2005). This can hardly be counted as a major criticism, since no books of this kind have yet to date been able to incorporate the often highly technical and mathematical presentations of advanced psychometrics.

That said, the sophistication of Spitzer’s conceptual framework and recommendations make them remarkably ready to incorporate insights from measurement theory, testing practice, developmental psychology, and the history of science. Doing so will propel the strategies recommended in this book into widespread adoption and will be a catalyst for the emerging re-invention of capitalism. In this coming cultural revolution, intangible forms of capital will be brought to life in common currencies for the exchange of value that perform the same function performed by kilowatts, bushels, barrels, and hours for tangible forms of capital (Fisher, 2009, 2010).

Pretty big claim, you say? Yes, it is. Here’s how it’s going to work.

  • First, measurement leadership within organizations that implements policies and procedures that are context-sensitive, focused, integrated, and interactive (i.e., that have Spitzer’s keys in hand) will benefit from instruments calibrated to facilitate:
    • meaningful mapping of substantive, additive amounts of things measured on number lines;
    • data volume reductions on the order of 80-95% and more, with no loss of information;
    • organizational and individual learning trajectories defined by hierarchies of calibrated items;
    • measures that retain their meaning and values across changes in item content;
    • adapting instruments to people and organizations, instead of vice versa;
    • estimating the consistency, and the leniency or harshness, of ratings assigned by judges evaluating performance quality, with the ability to remove those effects from the performance measures made;
    • adjusting measurement precision to the needs of the task at hand, so that time and resources are not wasted in gathering too much or too little data; and
    • providing the high quality and uniform information needed for networked collective thinking able to keep pace with the demand for innovation.
  • Second, measurement leadership sensitive to the four keys across organizations, both within and across industries, will find value in:
    • establishing industry-wide metrological standards defining common metrics for the expression of the primary human, social, and natural capital constructs of interest;
    • lubricating the flow of human, social, and natural capital in efficient markets broadly defined so as to inform competitive pricing of intangible assets, products, and services; and
    • new opportunities for determining returns on investments in human, community, and environmental resource management.
  • Third, living companies need to be able to mature in a manner akin to human development over the lifespan. Theories of hierarchical complexity and developmental stage transitions that inform the rigorous measurement of cognitive and moral transformations (Dawson & Gabrielian, 2003) will increasingly find highly practical applications in organizational contexts.

Leadership of the kind described by Spitzer is needed not just to make measurement contextualized, focused, integrated, and interactive—and so productive at new levels of effectiveness—but to apply systematically the technical, financial, and social resources needed to realize the rich potentials he describes for the transformation of organizations and empowerment of individuals. Spitzer’s program surpasses the usual focus on centralized statistical analyses and reports to demand the organization-wide dissemination of calibrated instruments that measure in common metrics. The flexibility, convenience, and scientific rigor of instruments calibrated to measure in units that really add up fit the bill exactly. Here’s to putting tools that work in the hands of those who know what to do with them!

References

Andrich, D. (2005). Georg Rasch: Mathematician and statistician. In K. Kempf-Leonard (Ed.), Encyclopedia of Social Measurement (Vol. 3, pp. 299-306). Amsterdam: Academic Press, Inc.

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Dawson, T. L., & Gabrielian, S. (2003, June). Developing conceptions of authority and contract across the life-span: Two perspectives. Developmental Review, 23(2), 162-218.

Fisher, W. P., Jr. (2009, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Fisher, W. P., Jr. (2010). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 11, in press [Pre-press version available at http://www.livingcapitalmetrics.com/images/BringingHSN_FisherARMII.pdf%5D.

Masters, G. N. (2007). Special issue: Programme for International Student Assessment (PISA). Journal of Applied Measurement, 8(3), 235-335.

Spitzer, D. (2007). Transforming performance measurement: Rethinking the way we measure and drive organizational success. New York: AMACOM.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

How to Trade “Global Mush” for Beauty, Meaning, and Value: Reflections on Lanier’s New Book

January 15, 2010

Implicit in many of my recent posts here is the idea that we must learn how to follow through on the appropriation of meaning to proper ownership of the properties characteristic of our own proprietary capital resources: the creativities, abilities, skills, talents, health, motivations, trust, etc.  that make us each reliable citizens and neighbors, and economically viable in being hireable, promotable, productive, and retainable. Individual control of investment in, income from, and returns on our own shares of human, social, and natural capital ought to be a fundamental constitutional human right.

But, just as property rights are constitutionally guaranteed by nations around the world that don’t take the trouble to enforce them or even to provide their necessary infrastructural capacities, so, too, are human rights to equal opportunities widely guaranteed without being properly provided for or enforced. And now in the Internet age, we have succeeded in innovating ever more fluid media for the expression of our individual capacities for making original cultural, technical, and human contributions, but we have yet to figure out how to exert effective control over the returns and income generated by these contributions.

Jaron Lanier’s new book, “You Are Not a Gadget,” is taking up this theme in interesting ways. In his recent Wall Street Journal article, Lanier says:

“There’s a dominant dogma in the online culture of the moment that collectives make the best stuff, but it hasn’t proven to be true. The most sophisticated, influential and lucrative examples of computer code—like the page-rank algorithms in the top search engines or Adobe’s Flash— always turn out to be the results of proprietary development. Indeed, the adored iPhone came out of what many regard as the most closed, tyrannically managed software-development shop on Earth.

Actually, Silicon Valley is remarkably good at not making collectivization mistakes when our own fortunes are at stake. On the one hand we want to avoid physical work and instead benefit from intellectual property. On the other hand, we’re undermining intellectual property so that information can roam around for nothing, or more precisely as bait for advertisements. That’s a formula that leaves no way for our nation to earn a living in the long term.
The “open” paradigm rests on the assumption that the way to get ahead is to give away your brain’s work—your music, writing, computer code and so on—and earn kudos instead of money. You are then supposedly compensated because your occasional dollop of online recognition will help you get some kind of less cerebral work that can earn money. For instance, maybe you can sell custom branded T-shirts.
We’re well over a decade into this utopia of demonetized sharing and almost everyone who does the kind of work that has been collectivized online is getting poorer. There are only a tiny handful of writers or musicians who actually make a living in the new utopia, for instance. Almost everyone else is becoming more like a peasant every day.”
Lanier’s suggestions of revised software structures and micropayment systems in an extension of intellectual property rights correctly recognizes the scope of the challenges we face. He also describes the motivations driving the ongoing collectivization process, saying that “youthful fascination with collectivism is in part simply a way to address perceived ‘unfairness’.” This radical way of enforcing a very low lowest common denominator points straight at the essential problem, and that problem is apparent in the repeated use of the key word, collective.

It was not so long ago that it was impossible to use that word without immediately evoking images of Soviet central planning and committees. The “global mush” of mediocrity Lanier complains about as a direct result of collective thinking is a very good way of describing the failures of socialism that brought down the Soviet Union by undercutting its economic viability. Lanier speaks of growing up and enthusiastically participating various forms of collective life, like food co-ops and shared housing. I, too, have shared those experiences. I saw, as Lanier sees and as the members of communes in the U.S. during the 1960s saw, that nothing gets done when no one owns the process and stands to reap the rewards: when housekeeping is everyone’s responsibility, no one does it.

Further and more to the point, nothing goes right when supply and demand are dictated by a central committee driven by ideological assumptions concerning exactly what does and does not constitute the greater good.  On the contrary, innovation is stifled, inefficiencies are rampant, and no one takes the initiative to do better because there are no incentives for doing so. Though considerable pain is experienced in allowing the invisible hand to coordinate the flux and flows of markets, no better path to prosperity has yet been found. The current struggle is less one of figuring out how to do without markets than it is one of figuring out how to organize them for greater long term stability. As previous posts in this blog endeavor to show, we ought to be looking more toward bringing all forms of capital into the market instead of toward regulating some to death while others ravage the economy, scot-free.

Friedrich von Hayek (1988, 1994) is an economist and philosopher often noted for his on-target evaluations of the errors of socialism. He tellingly focused on the difference between the laborious micromanagement of socialism’s thought police and the wealth-creating liberation of capital’s capacity for self-organization. It is interesting that Lanier describes the effects of demonetized online sharing as driving most of us toward peasant status, as Hayek (1994) describes socialism as a “road to serfdom.” Of course, capitalism itself is far from perfect, since private property, and manufactured and liquid capital, have enjoyed a freedom of movement that too often recklessly tramples human rights, community life, and the natural environment. But as is described in a previous blog I posted on re-inventing capitalism, we can go a long way toward rectifying the errors of capitalism by setting up the rules of law that will lubricate and improve the efficiency of human, social, and natural capital markets.

Now, I’ve always been fascinated with the Latin root shared in words like property, propriety, proprietary, appropriation, proper, and the French propre (which means both clean and one’s own, or belonging to oneself, depending on whether it comes before or after the noun; une maison propre = a clean house and sa propre maison = his/her own house). I was then happy to encounter in graduate school Ricoeur’s (1981) theory of text interpretation, which focuses on the way we create meaning by appropriating it. Real understanding requires that we must make a text our own if we are to be able to give proper evidence of understanding it by restating or summarizing it in our own words.

Such restating is, of course, also the criterion for demonstrating that a scientific theory of the properties of a phenomenon is adequate to the task of reproducing its effects on demand. As Ricoeur (1981, p. 210) says, situating science in a sphere of signs puts the human and natural sciences together on the same footing in the context of linguistically-mediated social relations. This unification of the sciences has profound consequences, not just for philosophy, the social sciences, or economics, but for the practical task of transforming the current “global mush” into a beautiful, meaningful, and effective living creativity system. So, there is real practical significance in realizing what appropriation is and how its processes feed into our conceptualizations of property, propriety, and ownership.

When we can devise a new instrument or measuring method that gives the same results as an existing instrument or method, we have demonstrated theoretical control over the properties of the phenomenon (Heelan, 1983, 2001; Ihde, 1991; Ihde & Selinger, 2003; Fisher, 2004, 2006, 2010b). The more precisely the effects are reproduced, the purer they become, the clearer their representation, and the greater their independence from the local contingencies of sample, instrument, observer, method, etc. When we can package a technique for reproducing the desired effects (radio or tv broadcast/reception, vibrating toothbrushes, or what have you), we can export the phenomenon from the laboratory via networks of distribution, supply, sales, marketing, manufacture, repair, etc. (Latour, 1987). Proprietary methods, instruments, and effects can then be patented and ownership secured.

What we have in the current “global mush” of collective aggregations are nothing at all of this kind. There are specific criteria for information quality and network configuration (Akkerman, et al., 2007; Latour, 1987, pp. 247-257; Latour, 1995; Magnus, 2007; Mandel, 1978; Wise, 1995) that have to be met for collective cognition to realize its potential in the manner described by Surowiecki (2004) or Brafman and Beckstrom (2006), for instance.  The difference is the difference between living and dead capital, between capitalism and socialism, and between scientific measurement and funny numbers that don’t stand for the repetitive additivity of a constant unit (Fisher, 2002, 2009, 2010a). As Lanier notes, Silicon Valley understands very well the nature of this difference, and protects its own interests by vigilantly ensuring that its collective cognitions are based in properly constructed information and networks.

And here we find the crux of the lesson to be learned. We need to focus very carefully on the details of how we create meaningful relationships, of how things come into words, of how instruments are calibrated and linked together in shared systems of signification, and of how economies thrive on the productive efficiencies of well-lubricated markets. Everything we need to turn things around is available, though seeing things for what they are is one of the most daunting and difficult tasks we can undertake.

The postmodern implications of the way appropriation is more a letting-go than a possessing (Ricoeur, 1981, p. 191) will be taken up another time, in the context of the playful flow of signification we are always already caught up within. For now, it is enough to point the way toward the issues raised and examined in other posts in this blog as to how capital is brought to life. We are well on the way toward a convergence of efforts that may well result in exactly the kind of fierce individuals and competing teams able to reap their just due, as Lanier envisions.

References

Akkerman, S., Van den Bossche, P., Admiraal, W., Gijselaers, W., Segers, M., Simons, R.-J., Kirschnerd, P. (2007, February). Reconsidering group cognition: From conceptual confusion to a boundary area between cognitive and socio-cultural perspectives? Educational Research Review, 2, 39-63.
Brafman, O., & Beckstrom, R. A. (2006). The starfish and the spider: The unstoppable power of leaderless organizations. New York: Portfolio (Penguin Group).

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.

Fisher, W. P., Jr. (2006). Meaningfulness, sufficiency, invariance, and conjoint additivity. Rasch Measurement Transactions, 20(1), 1053 [http://www.rasch.org/rmt/rmt201.htm].

Fisher, W. P., Jr. (2009, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Fisher, W. P., Jr. (2010a). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 11, in press [http://www.livingcapitalmetrics.com/images/BringingHSN_FisherARMII.pdf].

Fisher, W. P., Jr. (2010)b. Reducible or irreducible? Mathematical reasoning and the ontological method. Journal of Applied Measurement, 11, in press.

von Hayek, F. A. (1988). The fatal conceit: The errors of socialism (W. W. Bartley, III, Ed.) (Vol. I). The Collected Works of F. A. Hayek. Chicago: University of Chicago Press.

von Hayek, F. A. (1994/1944). The road to serfdom (Fiftieth Anniversary Edition; Introduction by Milton Friedman). Chicago: University of Chicago Press.

Heelan, P. A. (1983, June). Natural science as a hermeneutic of instrumentation. Philosophy of Science, 50, 181-204.

Heelan, P. A. (2001). The lifeworld and scientific interpretation. In S. K. Toombs (Ed.), Handbook of phenomenology and medicine (pp. 47-66). Chicago: University of Chicago Press.

Ihde, D., & Selinger, E. (Eds.). (2003). Chasing technoscience: Matrix for materiality. (Indiana Series in Philosophy of Technology). Bloomington, Indiana: Indiana University Press.
Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York: Cambridge University Press.

Latour, B. (1995). Cogito ergo sumus! Or psychology swept inside out by the fresh air of the upper deck: Review of Hutchins’ Cognition in the Wild, MIT Press, 1995. Mind, Culture, and Activity: An International Journal, 3(192), 54-63.

Magnus, P. D. (2007). Distributed cognition and the task of science. Social Studies of Science, 37(2), 297-310.

Mandel, J. (1978, December). Interlaboratory testing. ASTM Standardization News, 6, 11-12.

Ricoeur, P. (1981). Hermeneutics and the human sciences: Essays on language, action and interpretation (J. B. Thompson, Ed. & Trans). Cambridge, England: Cambridge University Press.

Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies and nations. New York: Doubleday.
Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Universal Rights and Universal Measures: Advancing Science, Economics, and Democracy Simultaneously

January 14, 2010

Art historians and political theorists often remark on the way the columns in Greek temples symbolize the integration of individuals and society in democracies. The connection of architecture and forms of government is well enough known that at least one theater critic was compelled to include it in a review of a World War II-themed musical (Wonk, 2002). With an eye to illuminating the victory over fascism, he observed that Greek temple pillars

“are unique, curved, each one slightly different. They are harmonized in a united effort. They are a democracy. Whereas, the temples of the older, Eastern empires are supported by columns that are simply straight sticks, interchangeable. The phalanx of individual citizens was stronger than the massed army of slaves [and so 9,000 Greek citizen soldiers could defeat 50,000 Persian mercenaries and slaves at the Battle of Marathon in the fifth century BCE].”

Wonk makes this digression in a review of a musical, The 1940’s Radio Hour, to set the stage for his point that

“while listening to the irrepressible and irresistible outpourings of Tin Pan Alley, I understood that the giant fascist war machine, with its mechanical stamp, stamp, stamp of boots was defeated, in a sense, by American syncopation. ‘Deutscheland Deutscheland Uber Alles’ ran aground and was wrecked on the shoals of ‘The Boogie Woogie Bugle Boy of Company B.'”

Of course, the same thing has been said before (the Beatles’ “Back in the USSR” brought down the Berlin Wall, etc.), but the sentiment is right on target. The creativity and passion of free people will ultimately always win out over oppressive regimes that kill joy and try to control innovation. As Emma Goldman is famously paraphrased, a revolution that bans dancing isn’t worth having. What we see happening here is a way in which different sectors of life are co-produced as common values resonate across the social, political, economic, and scientific spheres (Jasanoff, 2004; Jasanoff and Martello, 2004; Wise, 1995).

So how does science come to bear? Consider Ken Alder’s (2002, pp. 2, 3) perspective on the origins of the metric system:

“Just as the French Revolution had proclaimed universal rights for all people, the savants argued, so too should it proclaim universal measures.”
“…the use a society makes of its measures expresses its sense of fair dealing. That is why the balance scale is a widespread symbol of justice. … Our methods of measurement define who we are and what we value.”

As I’ve been saying in the signature line of my emails for many years, “We are what we measure. It’s time we measured what we want to be.” The modern world’s alienating consumer culture is fundamentally characterized by they way it compromises our ability to relate our experiences as individuals to shared stories that are true of us all, even if they actually never happened in their specific details to any of us. Being able to recognize the pattern of our own lives in the stories that we tell is what makes for science and technology’s universal applicability, as well as for great literature, powerful historical accounts, poetry that resonates across the centuries, as well as political and religious convictions strong enough to rationalize war and totalitarian repression.

In traditional cultures, ancient myths tell the stories that shape the world and enable everyone to find and value their place in it. Because these stories were transmitted from generation to generation orally, they could change a little with each retelling without anyone noticing. This allowed the myths to remain current and relevant as history unfolded in times with a slower pace of change.

But modern Western culture is blessed and cursed with written records that remain fixed. Instead of the story itself slowly changing with the times in every retelling, now new interpretations of the story emerge more quickly in the context of an overall faster pace of change, opening the door to contentious differences in the way the text is read. We’re now in the untenable and tense situation of some of us (relativists) feeling that all interpretations are legitimate, and others of us (fundamentalists) feeling that our interpretation is the only valid one.

Contrary to the way it often seems, rampant relativism and fundamentalist orthodoxy are not our only alternatives. As Paul Ricoeur (1974, p. 291-292) put it,

“…for each of the historical societies, the developing as well as those advanced in industrialization, the task is to exercise a kind of permanent arbitration between technical universalism and the personality constituted on the ethico-political plane. All the struggles of decolonization and liberation are marked by the double necessity of entering into the global technical society and being rooted in the cultural past.”

Without going into an extensive analysis of the ways in which the metaphors embedded in each culture’s language, concepts and world view structure meaning in universally shared ways, suffice it to say that what we need is a way of mediating between the historical past and a viable future.

We obtain mediations of this kind when we are able to identify patterns in our collective behaviors consistent enough to be considered behavioral laws. Such patterns are revealed in Rasch measurement instrument calibration studies by the way that every individual’s pattern of responses to the questions asked might be unique but still in probabilistic conformity with the overall pattern in the data as a whole. What we have in Rasch measurement is directly analogous with the pillars of ancient Greek temples: unique individuals harmonized and coordinated in common interpretations, collective effort and shared purpose.

The difficulty is in balancing respect for individual differences with capitalizing on the aggregate pattern. This is, as Gadamer (1991, pp. 7-8) says, the

“systematic problem of philosophy itself: that the part of lived reality that can enter into the concept is always a flattened version-like every projection of a living bodily existence onto a surface. The gain in unambiguous comprehensibility and repeatable certainty is matched by a loss in stimulating multiplicity of meaning.”

The problem is at least as old as Plato’s recognition of the way that (a) the technology of writing supplants and erases the need for detailed memories, and (b) counting requires us to metaphorically abstract something in common from what are concretely different entities. In social measurement, justice and respect for individual dignity requires that we learn to appreciate uniqueness while taking advantage of shared similarities (Ballard, 1978, p. 189).

Rasch’s models for measurement represent a technology essential to achieving this balance between the individual and society (Fisher, 2004, 2010). In contrast with descriptive statistical models that focus on accounting for as much variation as possible within single data sets, prescriptive measurement models focus on identifying consistent patterns across data sets. Where statistical models are content to conceive of individuals as interchangeable and structurally identical, measurement models conceive of individuals as unique and seek to find harmonious patterns of shared meanings across them. When such patterns are in hand, we are able to deploy instruments embodying shared meanings to the front lines of applications in education, health care, human resource management, organizational performance assessment, risk management, etc.

The consistent data patterns observed over several decades of Rasch applications (for examples, see Bond, 2008; Stenner, Burdick, Sanford, & Burdick, 2006) document and illustrate self-organizing forms of our collective life. They are, moreover, evidence of capital resources of the first order that we are only beginning to learn about and integrate into our institutions and social expectations. Wright (1999, p. 76) recognized that we need to “reach beyond the data in hand to what these data might imply about future data, still unmet, but urgent to foresee.” When repeated observations, tests, experiments, and practices show us unequivocally that our abilities, attitudes, behaviors, health, social relationships, etc. are structured in ways that we can rely on as objective constants across the particulars of who, when, where, and what, as the burgeoning scientific literature shows, we will create a place in which we will again feel at home in a larger community of shared values.

To take one example, everyone is well aware that “it’s who you know, not what you know” that matters most in finding a job, making sales, or in generally creating a place for oneself in the world. The phenomenon of online social networking has only made the truth of this platitude more evident. Culturally, we have evolved ways of adapting to the unfairness of this, though it still rankles and causes discontent.

But what if we capitalized on the general consensus on the structure of abilities, motivations, productivity, health, and trustworthiness that is emerging in the research literature? What if we actually created an Intangible Assets Metric System (see my 2009 blog on this issue) that would provide a basis of comparison integrating individual perspectives with the collective social perspective? Such an integration is what is implied in every successful Rasch measurement instrument calibration. Following through on these successes to the infrastructure of rights to our own human, social, and natural capital would not only advance economic prosperity and scientific learning on a whole new scale of magnitude, but democratic institutions themselves would also be renewed in fundamental ways.

The convergence of political revolutions, the Industrial Revolution, and the Second Scientific revolution in the late 18th and early 19th centuries was, after all, not just a coincidence. In the same way that the metric system simultaneously embodied the French Revolution’s political values of universal rights, equal representation, fairness and justice; scientific values of universal comparability; and capitalist values of efficient, open markets, so, too, will an Intangible Assets Metric System expand and coordinate these values as we once again reinvent who we are and what we want to be.

Alder, K. (2002). The measure of all things: The seven-year odyssey and hidden error that transformed the world. New York: The Free Press.

Ballard, E. G. (1978). Man and technology: Toward the measurement of a culture. Pittsburgh, Pennsylvania: Duquesne University Press.

Bond, T. (2008). Invariance and item stability. Rasch Measurement Transactions, 22(1), 1159 [http://www.rasch.org/rmt/rmt221h.htm].

Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.

Fisher, W. P., Jr. (2010). Reducible or irreducible? Mathematical reasoning and the ontological method. Journal of Applied Measurement, 11, in press.

Gadamer, H.-G. (1991). Plato’s dialectical ethics: Phenomenological interpretations relating to the Philebus (R. M. Wallace, Trans.). New Haven, Connecticut: Yale University Press.

Jasanoff, S. (2004). States of knowledge: The co-production of science and social order. International Library of Sociology). New York: Routledge.

Jasanoff, S., & Martello, M. L. ((Eds.)). (2004). Earthly politics: Local and global in environmental governance. Politics, Science, and the Environment). Cambridge, MA: MIT Press.

Ricoeur, P. (1974). Political and social essays (D. Stewart & J. Bien, Eds.). Athens, Ohio: Ohio University Press.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Wise, M. N. (Ed.). (1995). The values of precision. Princeton, New Jersey: Princeton University Press.

Wonk, D. (2002, June 11). Theater review: Looking back. Gambit Weekly, 32. Retrieved 20 November 2009, from http://bestofneworleans.com/gyrobase/PrintFriendly?oid=oid%3A28341.

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Reinventing Capitalism: Diagramming Living Capital Flows in a Green, Sustainable, and Responsible Economy

January 13, 2010

I often make reference to the difference between the currently predominant three-capitals economic model and the emerging green, sustainable, and socially responsible four-capitals model. I also cite my primary sources for this distinction in the work of Paul Ekins and Paul Hawken (see references below), though the basic ideas go back to older work by John Hicks, Irving Fisher, and many others. A picture is worth a thousand words, so it’s about time I took the trouble to explain this vital difference in greater detail, graphically.

The basic issue is that the current state of capitalism is problematic because it is still at an early stage in its development. Capitalism is incomplete. Its fundamental flaws stem less from inherent contradictions than from resolvable ones. The moral fundamentals of capitalism are identical with those forming the basis of meaningful and productive relationships in natural and social ecologies. Profit, for example, is perhaps best understood as nourishment for life (de Geus, 1997; also see Friedman, 2008; Goldberg, 2009; Greider, 2003; Hawken, 1993; Shermer, 2007). This is in fact the point of the emergent field of ecological economics. Thus, as Hawken, Lovins, and Lovins (1999, p. 5) put it,

“Capitalism, as practiced, is a financially profitable, nonsustainable aberration in human development. What might be called ‘industrial capitalism’ does not fully conform to its own accounting principles. It liquidates its capital and calls it income. It neglects to assign any value to the largest stocks of capital it employs–the natural resources and living systems, as well as the social and cultural systems that are the basis of human capital.”

Ekins, Hillman, and Hutchison (1992, p. 49; also see Ekins, 1992; Hawken, Lovins, & Lovins, 1999)  provide a diagram modeling the incomplete three-capitals economic model (see Figure 1). Here we see the narrowly-defined land, labor, and manufactured capital being burned for consumption, utility, and profits, some of which are recycled back into new investments. Waste and inefficiency do not figure into the process enough to be of concern. Profit is defined in terms of whatever the market will bear and the destruction of the capital resource base.

Figure 2 (from Ekins, Hillman, & Hutchison, 1992, p. 61) diagrams the comprehensively-defined four-capitals model. Now we have human, social, natural, and manufactured capital flowing together in a system that includes all of the resources necessary for life, liberty, and the pursuit of happiness. Profit is defined in terms of waste removal and increasingly lean thinking and acting (Hawken, Lovins, & Lovins, 1999, pp. 125, 133). Since all human suffering, sociopolitical discontent, and environmental degradation can be be expressed in terms of waste (Hawken, Lovins, & Lovins, 1999, p. 59), the four-capitals model is a necessary component of any policy platform that focuses on driving sustainable increases in Genuine Progress Indicators or Happiness Indexes (Anielski, 2007; Cameron, 2008).

One of the major themes of this blog is the concept of living capital.  Though human, social, and natural capital literally involve living beings, capital itself must be brought to life before a four-capitals economy can be made workable (Fisher, 2002, 2005, 2009a, 2009b, 2010). Capital is brought to life via measures that (1) scientifically stand for something that adds up in a way that can be meaningfully expressed in numbers and (2) are expressed in univerally uniform and accessible common languages that function as currencies for the exchange of value.

Scientific and financial instruments of all kinds are calibrated in order to tap the flowing current of value in an economy, giving tangible expression to something inherently abstract and intangible. When we can represent capital via scientific measures, it comes alive in the way that its flow is unimpeded by the inefficiencies of so-called measures that do not add up and that are locally dependent on specific instruments (Fisher, 2002, 2007, 2010). Transaction costs, the most important costs in any economy, are dramatically affected by the quality of measurement (Barzel, 1982; Benham & Benham, 2000). The core problem in re-inventing capitalism is figuring out how to bring social and environmental externalities into the market. Measurement will inevitably be of vital concern in solving this problem. Unfortunately, deep, broad and usually unexamined cultural assumptions about the supposed limits of measures made using tests, surveys, and assessments are needlessly  shackling expectations and prolonging dependence on obsolete technologies. See prior posts in this blog and my publications (Fisher, 2000, 2005, 2006, 2007, 2008a, 2008b, 2009a, 2009b, 2010, etc.) for more on the technical opportunities and economics of metrology.

No matter if we’re measuring bushels, barrels, hours, kilowatts, health, motivation, trust, performance, or environmental quality, scientific measurement opens the door to legal and financial means of establishing value and ownership. When our metrics are not scientific, as is the case with the vast majority of measures of human, social, and natural capital, it is nearly impossible to know what we’re getting for our money and to show how much stock of value we hold. It is accordingly also then impossible to manage the economy systematically and responsibly for long term sustainability. We must then build the metrological infrastructure of human, social, and natural capital (Fisher, 2009a, 2009b) in a new extension of the metric system if we are to ever achieve a comprehensive four-capitals economy that effectively re-invents capitalism.

References

Anielski, M. (2007). The economics of happiness: Building genuine wealth. Gabriola, British Columbia: New Society Publishers.

Barzel, Y. (1982). Measurement costs and the organization of markets. Journal of Law and Economics, 25, 27-48.

Benham, A., & Benham, L. (2000). Measuring the costs of exchange. In C. Ménard (Ed.), Institutions, contracts and organizations: Perspectives from new institutional economics (pp. 367-375). Cheltenham, UK: Edward Elgar.

Cameron, G. (2008, Spring/Summer). Oikos and economy: The Greek legacy in economic thought. PhaenEx, 3(1), 112-33.

de Geus, A. (1997). The living company: Habits for survival in a turbulent business environment (Foreword by Peter M. Senge). Boston, MA: Harvard Business School Press.

Ekins, P. (1992). A four-capital model of wealth creation. In P. Ekins & M. Max-Neef (Eds.), Real-life economics: Understanding wealth creation (pp. 147-15). London: Routledge.

Ekins, P. (1999). Economic growth and environmental sustainability: The prospects for green growth. New York: Routledge.

Ekins, P. (2003, March). Identifying critical natural capital: Conclusions about critical natural capital. Ecological Economics, 44(2-3), 277-292.

Ekins, P., Folke, C., & De Groot, R. (2003, March). Identifying critical natural capital. Ecological Economics, 44(2-3), 159-163.

Ekins, P., Hillman, M., & Hutchison, R. (1992). The Gaia atlas of green economics (Foreword by Robert Heilbroner). New York: Anchor Books.

Ekins, P., Simon, S., Deutsch, L., Folke, C., & De Groot, R. (2003, March). A framework for the practical application of the concepts of critical natural capital and strong sustainability. Ecological Economics, 44(2-3), 165-185.

Ekins, P., & Voituriez, T. (2009). Trade, globalization and sustainability impact assessment: A critical look at methods and outcomes. London, England: Earthscan Publications Ltd.

Fisher, W. P., Jr. (2000). Objectivity in psychosocial measurement: What, why, how. Journal of Outcome Measurement, 4(2), 527-563 [http://www.livingcapitalmetrics.com/images/WP_Fisher_Jr_2000.pdf].

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.

Fisher, W. P., Jr. (2005). Daredevil barnstorming to the tipping point: New aspirations for the human sciences. Journal of Applied Measurement, 6(3), 173-9 [http://www.livingcapitalmetrics.com/images/FisherJAM05.pdf].

Fisher, W. P., Jr. (2006). Meaningfulness, sufficiency, invariance, and conjoint additivity. Rasch Measurement Transactions, 20(1), 1053 [http://www.rasch.org/rmt/rmt201.htm].

Fisher, W. P., Jr. (2007, Summer). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-3 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2008a, March 28). Rasch, Frisch, two Fishers and the prehistory of the Separability Theorem. In  William P. Fisher, Jr. (Chair), Session 67.056. Reading Rasch Closely: The History and Future of Measurement. American Educational Research Association, Rasch Measurement SIG, New York University, New York City [http://www.livingcapitalmetrics.com/images/RaschFisherFrisch.EJHET.pdf].

Fisher, W. P., Jr. (2008b). Vanishing tricks and intellectualist condescension: Measurement, metrology, and the advancement of science. Rasch Measurement Transactions, 21(3), 1118-1121 [http://www.rasch.org/rmt/rmt213c.htm].

Fisher, W. P., Jr. (2009a, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Fisher, W. P.. Jr. (2009b). NIST Critical national need idea White Paper: metrological infrastructure for human, social, and natural capital (Tech. Rep.  http://www.livingcapitalmetrics.com/images/FisherNISTWhitePaper2.pdf). New Orleans: LivingCapitalMetrics.com.

Fisher, W. P., Jr. (2010). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In M. Wilson, K. Draney, N. Brown & B. Duckor (Eds.), Advances in Rasch Measurement, Vol. Two (p. in press [http://www.livingcapitalmetrics.com/images/BringingHSN_FisherARMII.pdf]). Maple Grove, MN: JAM Press.

Friedman, D. (2008). Morals and markets: An evolutionary account of the modern world. Palgrave Macmillan.

Goldberg, S. H. (2009). Billions of drops in millions of buckets: Why philanthropy doesn’t advance social progress. New York: Wiley.

Greider, W. (2003). The soul of capitalism: Opening paths to a moral economy. New York: Simon & Schuster.

Hawken, P. (1993). The ecology of commerce: A declaration of sustainability. New York: HarperCollins Publishers.

Hawken, P., Lovins, A., & Lovins, H. L. (1999). Natural capitalism: Creating the next industrial revolution. New York: Little, Brown, and Co.

Korten, D. (2009). Agenda for a new economy: From phantom wealth to real wealth. San Francisco: Berret-Koehler Publishing.

Shermer, M. (2007). The mind of the market: Compassionate apes, competitive humans, and other tales from evolutionary economics. New York: Times Books.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Mass Customization: Tailoring Tests, Surveys, and Assessments to Individuals without Sacrificing Comparability

January 11, 2010

One of the recurring themes in this blog concerns the technical capacities for more precise and meaningful measurement that remain unrecognized and under-utilized in business, finance, and economics. One of the especially productive capacities I have in mind relates to the techniques of adaptive measurement. These techniques make it possible to tailor measuring tools to the needs of the people measured, which is the diametric opposite of standard practice, which typically assumes it is necessary for people to adapt to the needs of the measuring instrument.

Think about what it means to try to measure every case using the same statements. When you define the limits of your instrument in terms of common content, you are looking for a one-size-fits-all solution. This design requires that you restrict the content of the statements to those that will be relevant in every case. The reason for proceeding in this way hinges on the assumption that you need to administer all of the items to every case in order to make the measures comparable, but this is not true. To conceive measurement in this way is to be shackled to an obsolete technology. Instead of operating within the constraints of an overly-limiting set of assumptions, you could be designing a system that takes missing data into account and that supports adaptive item administration, so that the instrument is tailored to the needs of the measurement situation. The benefits from taking this approach are extensive.

Think of the statements comprising the instrument as defining a hierarchy or continuum that extends from the most important, most agreeable, or easiest-to-achieve things at the bottom, and the least important, least agreeable, and hardest to achieve at the top. Imagine that your data are consistent, so that the probability of importance, agreeability, or success steadily decreases for any individual case as you read up the scale.

Obtaining data consistency like this is not always easy, but it is essential to measurement and to calibrating a scientific instrument. Even when data do not provide the needed consistency, much can be learned from them as to what needs to be done to get it.

Now hold that thought: you have a matrix of complete data, with responses to every item for every case. Now, following the typically assumed design scenario, in which all items are applied to every case, no matter how low a measure is, you think you need to administer the items calibrated at the top of the scale, even if we know from long experience and repeated recalibrations across multiple samples that the response probabilities of importance, agreement, or success are virtually 0.00 for these items.

Conversely, no matter how high a measure is, the usual design demands that all items be administered, even if we know from experience that the response probabilities for the items at the bottom of the scale are virtually 1.00.

In this scenario, we are wasting time and resources obtaining data on items for which we already know the answers. We are furthermore not asking other questions that would be particularly relevant to different individual cases because to include them in a complete data design where one size fits all would make the instrument too long. So we are stuck with a situation in which perhaps only a tenth of the overall instrument is actually being used for cases with measures toward the extremes.

One of the consequences of this is that we have much less information about the very low and very high measures, and so we have much less confidence about where the measures are than we do for more centrally located measures.

If measurement projects were oriented toward the development of an item bank, however, these problems can be overcome. You might develop and calibrate dozens, hundreds, or thousands of items. The bank might be administered in such a way that the same sets of items are applied to different cases only rarely. To the extent that the basic research on the bank shows that the items all measure the same thing, so that different item subsets all give the same result in terms of resolving the location of the measure on the quantitative continuum, comparability is not compromised.

The big plus is that all cases can now be measured with the same degree of meaningfulness, precision and confidence. We can administer the same number of items to all cases, and we can administer the same number of items as you would in your one-size-fits-all design, but now the items are targeted at each individual, providing maximum information. But the quantitative properties are only half the story. Real measurement integrates qualitative meaningfulness with quantitative precision.

As illustrated in the description of the typically assumed one-size-fits-all scenario, we interpret the measures in terms of the item calibrations. In the one-size-fits-all design, very low and very high measures can be associated with consistent variation on only a few items, as there is no variation on most of the items, since they are too easy or hard for this case. And it might happen that even cases in the middle of the scale are found to have response probabilities of 1.00 and 0.00 for the items at the very bottom and top of the scale, respectfully, further impairing the efficiency of the measurement process.

In the adaptive scenario, though, items are selected from the item bank via an algorithm that uses the expected response probabilities to target the respondent. Success on an easy item causes the algorithm to pick a harder item, and vice versa. In this way, the instrument is tailored for the individual case. This kind of mass customization can also be qualitatively based. Items that are irrelevant to the particular characteristics of an individual case can be excluded from consideration.

And adaptive designs do not necessarily have to be computerized, since respondents, examinees, and judges can be instructed to complete a given number of contiguous items in a sequence ordered by calibration values. This effects a kind of self-targeting that effectively reduces the number of overall items administered without the need for expensive investments in programming or hardware.

The literature on adaptive instrument administration is over 40 years old, and is quite technical and extensive. I’ve provided a sample of articles below, including some providing programming guidelines.

The concepts of item banking and adaptive administration of course are the technical mechanisms on which will be built metrological networks of instruments linked to reference standards. See previously posted blog entries here for more on metrology and traceability.

References

Association of Test Publishers. (2001, Fall). Benjamin D. Wright, Ph.D. honored with the Career Achievement Award in Computer-Based Testing. Test Publisher, 8(2). Retrieved 20 May 2009, from http://www.testpublishers.org/newsletter7.htm#Wright.

Bergstrom, B. A., Lunz, M. E., & Gershon, R. C. (1992). Altering the level of difficulty in computer adaptive testing. Applied Measurement in Education, 5(2), 137-149.

Choppin, B. (1968). An item bank using sample-free calibration. Nature, 219, 870-872.

Choppin, B. (1976). Recent developments in item banking. In D. N. M. DeGruitjer & L. J. van der Kamp (Eds.), Advances in Psychological and Educational Measurement (pp. 233-245). New York: Wiley.

Cook, K., O’Malley, K. J., & Roddey, T. S. (2005, October). Dynamic Assessment of Health Outcomes: Time to Let the CAT Out of the Bag? Health Services Research, 40(Suppl 1), 1694-1711.

Dijkers, M. P. (2003). A computer adaptive testing simulation applied to the FIM instrument motor component. Archives of Physical Medicine & Rehabilitation, 84(3), 384-93.

Halkitis, P. N. (1993). Computer adaptive testing algorithm. Rasch Measurement Transactions, 6(4), 254-255.

Linacre, J. M. (1999). Individualized testing in the classroom. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 186-94). New York: Pergamon.

Linacre, J. M. (2000). Computer-adaptive testing: A methodology whose time has come. In S. Chae, U. Kang, E. Jeon & J. M. Linacre (Eds.), Development of Computerized Middle School Achievement Tests [in Korean] (MESA Research Memorandum No. 69). Seoul, South Korea: Komesa Press. Available in English at http://www.rasch.org/memo69.htm.

Linacre, J. M. (2006). Computer adaptive tests (CAT), standard errors, and stopping rules. Rasch Measurement Transactions, 20(2), 1062 [http://www.rasch.org/rmt/rmt202f.htm].

Lunz, M. E., & Bergstrom, B. A. (1991). Comparability of decision for computer adaptive and written examinations. Journal of Allied Health, 20(1), 15-23.

Lunz, M. E., & Bergstrom, B. A. (1994). An empirical study of computerized adaptive test administration conditions. Journal of Educational Measurement, 31(3), 251-263.

Lunz, M. E., & Bergstrom, B. A. (1995). Computerized adaptive testing: Tracking candidate response patterns. Journal of Educational Computing Research, 13(2), 151-162.

Lunz, M. E., Bergstrom, B. A., & Gershon, R. C. (1994). Computer adaptive testing. In W. P. Fisher, Jr. & B. D. Wright (Eds.), Special Issue: International Journal of Educational Research, 21(6), 623-634.

Lunz, M. E., Bergstrom, B. A., & Wright, B. D. (1992, Mar). The effect of review on student ability and test efficiency for computerized adaptive tests. Applied Psychological Measurement, 16(1), 33-40.

McHorney, C. A. (1997, Oct 15). Generic health measurement: Past accomplishments and a measurement paradigm for the 21st century. [Review] [102 refs]. Annals of Internal Medicine, 127(8 Pt 2), 743-50.

Meijer, R. R., & Nering, M. L. (1999, Sep). Computerized adaptive testing: Overview and introduction. Applied Psychological Measurement, 23(3), 187-194.

Raîche, G., & Blais, J.-G. (2009). Considerations about expected a posteriori estimation in adaptive testing. Journal of Applied Measurement, 10(2), 138-156.

Raîche, G., Blais, J.-G., & Riopel, M. (2006, Autumn). A SAS solution to simulate a Rasch computerized adaptive test. Rasch Measurement Transactions, 20(2), 1061.

Reckase, M. D. (1989). Adaptive testing: The evolution of a good idea. Educational Measurement: Issues and Practice, 8, 3.

Revicki, D. A., & Cella, D. F. (1997, Aug). Health status assessment for the twenty-first century: Item response theory item banking and computer adaptive testing. Quality of Life Research, 6(6), 595-600.

Riley, B. B., Conrad, K., Bezruczko, N., & Dennis, M. L. (2007). Relative precision, efficiency, and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN substance problem scale. Journal of Applied Measurement, 8(1), 48-64.

van der Linden, W. J. (1999). Computerized educational testing. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 138-50). New York: Pergamon.

Velozo, C. A., Wang, Y., Lehman, L., & Wang, J.-H. (2008). Utilizing Rasch measurement models to develop a computer adaptive self-report of walking, climbing, and running. Disability & Rehabilitation, 30(6), 458-67.

Vispoel, W. P., Rocklin, T. R., & Wang, T. (1994). Individual differences and test administration procedures: A comparison of fixed-item, computerized-adaptive, self-adapted testing. Applied Measurement in Education, 7(1), 53-79.

Wang, T., Hanson, B. A., & Lau, C. M. A. (1999, Sep). Reducing bias in CAT trait estimation: A comparison of approaches. Applied Psychological Measurement, 23(3), 263-278.

Ware, J. E., Bjorner, J., & Kosinski, M. (2000). Practical implications of item response theory and computerized adaptive testing: A brief summary of ongoing studies of widely used headache impact scales. Medical Care, 38(9 Suppl), II73-82.

Weiss, D. J. (1983). New horizons in testing: Latent trait test theory and computerized adaptive testing. New York: Academic Press.

Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375.

Weiss, D. J., & Schleisman, J. L. (1999). Adaptive testing. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 129-37). New York: Pergamon.

Wouters, H., Zwinderman, A. H., van Gool, W. A., Schmand, B., & Lindeboom, R. (2009). Adaptive cognitive testing in dementia. International Journal of Methods in Psychiatric Research, 18(2), 118-127.

Wright, B. D., & Bell, S. R. (1984, Winter). Item banks: What, why, how. Journal of Educational Measurement, 21(4), 331-345 [http://www.rasch.org/memo43.htm].

Wright, B. D., & Douglas, G. A. (1975). Best test design and self-tailored testing (Tech. Rep. No. 19). Chicago, Illinois: MESA Laboratory, Department of Education, University of Chicago [http://www.rasch.org/memo19.pdf] (Research Memorandum No. 19).

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Wright, B. D., & Douglas, G. A. (1975). Best test design and self-tailored testing (Tech. Rep. No. 19). Chicago, Illinois: MESA Laboratory, Department of Education,  University of Chicago [http://www.rasch.org/memo19.pdf] (Research Memorandum No. 19).

Tuning our assessment instruments to harmonize our relationships

January 10, 2010

“Music is the art of measuring well.”
Augustine of Hippo

With the application of Rasch’s probabilistic models for measurement, we are tuning the instruments of the human, social, and environmental sciences, with the aim of being able to harmonize relationships of all kinds. This is not an empty metaphor: the new measurement scales are equivalent, mathematically, with the well-tempered, and later 12-tone equal temperament, scales that were introduced in response to the technological advances associated with the piano.

The idea that the regular patterns found in music are akin to those found in the world at large and in the human psyche is an ancient one. The Pythagoreans held that

“…music’s concordances [were] the covenants that tones form under heaven’s watchful eye. For the Pythagoreans, though, the importance of these special proportions went well beyond music. They were signs of the natural order, like the laws governing triangles; music’s rules were simply the geometry governing things in motion: not only vibrating strings but also celestial bodies and the human soul” (Isacoff, 2001, p. 38).

I have already elsewhere in this blog elaborated on the progressive expansion of geometrical thinking into natural laws and measurement models; now, let us turn our attention to music as another fertile source of the analogies that have proven so productive over the course of the history of science (also explored elsewhere in this blog).

You see, tuning systems up to the invention of the piano (1709) required instruments to be retuned for performers to play in different keys. Each key had a particular characteristic color to its sound. And not only that, some note pairings in any key (such as every twelfth 5th in the mean tone tuning) were so dissonant that they were said to howl, and were referred to as wolves. Composers went out of their way to avoid putting these notes together, or used them in rare circumstances for especially dramatic effects.

Dozens of tuning systems had been proposed in the 17th century, and the concept of an equal-temperament scale was in general currency at the time of the piano’s invention. Bach is said to have tuned his own keyboards so that he could switch keys fluidly from within a composition. His “Well-Tempered Clavier” (published in 1722) demonstrates how a well temperament allows one to play in all 24 major and minor keys without retuning the instrument. Bach also is said to have deliberately used wolf note pairings to show that they did not howl in the way they did with the mean tone tuning.

Equal temperament is not equal-interval in the Pythagorean sense of same-sized changes in the frequencies of vibrating strings. Rather, those frequencies are scaled using the natural logarithm, and that logarithmic scale is what is divided into equal intervals. This is precisely what is also done in Rasch scaling algorithms applied to test, assessment, and survey data in contemporary measurement models.

Pianos are tuned from middle C out, with each sequential pair of notes to the left and right tuned to be the same distance away from C. As the tuner moves further and further away from C, the unit distance of the notes from middle C is slightly adjusted or stretched, so that the sharps and flats become the same note in the black keys.

What is being done, in effect, is that the natural logarithm of the note frequencies is being taken. In statistics, the natural logarithm is called a two-stretch transformation, because it pulls both ends of the normal distribution’s bell curve away from the center, with the ends being pulled further than the regions under the curve closer to the center. This stretching effect is of huge importance to measurement because it makes it possible for different collections of questions addressing the same thing to measure in the same unit.

That is, the instrument dependency of summed ratings or counts of right answers  or categorical response frequencies is like a key-dependent tuning system. The natural logarithm modulates transitions across musical notes in such a way as to make different keys work in the same scaling system, and it also modulates transitions across different reading tests so that they all measure in a unit that remains the same size with the same meaning.

Now, many people fear that the measurement of human abilities, attitudes, health, etc. must inherently involve a meaningless reduction of richly varied and infinite experience to a number. Many people are violently opposed to any suggestion that this could be done in a meaningful and productive way. However, is not music the most emotionally powerful and subtle art form in existence, and simultaneously also incredibly high-tech and mathematical? Even if you ignore the acoustical science and the studio electronics, the instruments themselves embody some of the oldest and most intensively studied mathematical principles in existence.

And, yes, these principles are used in TV, movies, dentists’ offices and retail stores to help create sympathies and environments conducive to the, sometimes painful and sometimes crass, commercial tasks at hand. But music is also by far the most popular art form, and it is accessible everywhere to everyone any time precisely as a result of the very technologies that many consider anathema in the human and social sciences.

But it seems to me that the issue is far more a matter of who controls the technology than it is one of the technology itself. In the current frameworks of the human and social sciences, and of the economic domains of human, social, and natural capital, whoever owns the instrument owns the measurement system and controls the interpretation of the data, since each instrument measures in its own unit. But in the new Rasch technology’s open architecture, anyone willing to master the skills needed can build instruments tuned to the reference standard, ubiquitous and universally available scale. What is more, the demand that all instruments measuring the same thing must harmonize will transfer control of data interpretation to a public sphere in which experimental reproducibility trumps authoritarian dictates.

This open standards system will open the door to creativity and innovation on a par with what musicians take for granted. Common measurement scales will allow people to jam out in an infinite variety of harmonic combinations, instrumental ensembles, choreographed moves, and melodic and rhythmic patterns. Just as music ranges from jazz to symphonic, rock to punk to hiphop to blues to country to techno, or atonal to R & B, so, too, do our relationships. A whole new world of potential innovations opens up in the context of methods for systematically evaluating naturally occurring and deliberately orchestrated variations in organizations, management, HR training methods, supply lines, social spheres, environmental quality, etc.

The current business world’s near-complete lack of comparable information on human, social, and natural capital is oppressive. It puts us in the situation of never knowing what we get for our money in education and healthcare, even as costs in these areas spiral into absolutely stratospheric levels. Having instruments in every area of education, health care, recreation, employment, and commerce tuned to common scales will be liberating, not oppressive. Having clear, reproducible, meaningful, and publicly negotiated measures of educational and clinical care outcomes, of productivity and innovation, and of trust, loyalty, and environmental quality will be a boon.

In conclusion, consider one more thing. About 100 years ago, a great many musicians and composers revolted against what they felt were the onerous and monotonous constraints of the equal-tempered tuning system. Thus we had an explosion of tonal and rhythmic innovations across the entire range of musical artistry. With the global popularity of world music’s blending of traditional forms with current technology and Western forms, the use of alternatives to equal temperament has never been greater. I read once that Joni Mitchell has used something like 32 different tunings in her recordings. Jimi Hendrix and Neil Young are also famous for using unique tunings to define their trademark sounds. What would the analogy of this kind of creativity be in the tuning of tests and surveys? I don’t know, but I’m looking forward to seeing it, experiencing it, and maybe even contributing to it. Les Paul may not be the only innovator in instrument design who figured out not only how to make it easy for others to express themselves in measured tones, but who also knew how to rock out his own yayas!

References and further reading:

Augustine of Hippo. (1947/2002). On music. In Writings of Saint Augustine Volume 2. Immortality of the soul and other works. (L. Schopp, Trans.) (pp. 169-384). New York: Catholic University of America Press.

Barbour, J. M. (2004/1954). Tuning and temperament: A historical survey. Mineola, NY: Dover Publications.

Heelan, P. A. (1979). Music as basic metaphor and deep structure in Plato and in ancient cultures. Journal of Social and Biological Structures, 2, 279-291.

Isacoff, S. M. (2001). Temperament: The idea that solved music’s greatest riddle. New York: Alfred A. Knopf.

Jorgensen, O. (1991). Tuning: Containing the perfection of eighteenth-century temperament, the lost art of nineteenth-century temperament and the science of equal temperament. East Lansing, Michigan: Michigan State University.

Kivy, P. (2002). Introduction to a philosophy of music. Oxford, England: Oxford University Press.

Mathieu, W. A. (1997). Harmonic experience: Tonal harmony from its natural origins to its modern expression. Rochester, Vermont: Inner Traditions International.

McClain, E. (1984/1976). The myth of invariance: The origin of the gods, mathematics and music from the Rg Veda to Plato (P. A. Heelan, Ed.). York Beach, Maine: Nicolas-Hays, Inc.

Russell, G. (2001/1953). Lydian chromatic concept of tonal organization (4th ed.). Brookline, MA: Concept Publishing.

Stone, M. (2002, Autumn). Musical temperament. Rasch Measurement Transactions, 16(2), 873.

Sullivan, A. T. (1985). The seventh dragon: The riddle of equal temperament. Lake Oswego, OR: Metamorphous Press.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.