Archive for the ‘Natural science’ Category

2011 IMEKO Conference Papers Published Online

January 13, 2012

Papers from the Joint International IMEKO TC1+ TC7+ TC13 Symposium held August 31st to September 2nd,  2011, in Jena, Germany are now available online at http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24575/IMEKO2011_TOC.pdf. The following will be of particular interest to those interested in measurement applications in the social sciences, education, health care, and psychology:

Nikolaus Bezruczko
Foundational Imperatives for Measurement with Mathematical Models
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24419/ilm1-2011imeko-030.pdf

Nikolaus Bezruczko, Shu-Pi C. Chen, Connie Hill, Joyce M. Chesniak
A Clinical Scale for Measuring Functional Caregiving of Children Assisted with Medical Technologies
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24507/ilm1-2011imeko-032.pdf

Stefan Cano, Anne F. Klassen, Andrea L. Pusic, Andrea
From Breast-Q © to Q-Score ©: Using Rasch Measurement to Better Capture Breast Surgery Outcomes
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24429/ilm1-2011imeko-039.pdf

Gordon A. Cooper, William P. Fisher, Jr.
Continuous Quantity and Unit; Their Centrality to Measurement
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24494/ilm1-2011imeko-019.pdf

William P. Fisher, Jr.
Measurement, Metrology and the Coordination of Sociotechnical Networks
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24491/ilm1-2011imeko-017.pdf

William .P Fisher, Jr., A. Jackson Stenner
A Technology Roadmap for Intangible Assets Metrology
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24493/ilm1-2011imeko-018.pdf

Carl V. Granger, Nikolaus Bezruczko
Body, Mind, and Spirit are Instrumental to Functional Health: A Case Study
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24494/ilm1-2011imeko-019.pdf

Thomas Salzberger
The Quantification of Latent Variables in the Social Sciences: Requirements for Scientific Measurement and Shortcomings of Current Procedures
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24417/ilm1-2011imeko-029.pdf

A. Jackson Stenner, Mark Stone, Donald Burdick
How to Model and Test for the Mechanisms that Make Measurement Systems Tick
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24416/ilm1-2011imeko-027.pdf

Mark Wilson
The Role of Mathematical Models in Measurement: A Perspective from Psychometrics
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24178/ilm1-2011imeko-005.pdf

Also of interest will be Karl Ruhm’s plenary lecture and papers from the Fundamentals of Measurement Science session and the Special Session on the Role of Mathematical Models in Measurement:

Karl H. Ruhm
From Verbal Models to Mathematical Models – A Didactical Concept not just in Metrology
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24167/ilm1-2011imeko-002.pdf

Alessandro Giordani, Luca Mari
Quantity and Quantity Value
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24414/ilm1-2011imeko-025.pdf

Eric Benoit
Uncertainty in Fuzzy Scales Based Measurements
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24415/ilm1-2011imeko-020.pdf

Susanne C.N. Töpfer
Application of Mathematical Models in Optical Coordinate Metrology
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24445/ilm1-2011imeko-008.pdf

Giovanni Battista Rossi
Measurement Modelling: Foundations and Probabilistic Approach
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24446/ilm1-2011imeko-009.pdf

Sanowar H. Khan, Ludwik Finkelstein
The Role of Mathematical Modelling in the Analysis and Design of Measurement Systems
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24448/ilm1-2011imeko-010.pdf

Roman Z. Morawski
Application-Oriented Approach to Mathematical Modelling of Measurement Processes
http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24449/ilm1-2011imeko-011.pdf

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Rasch Measurement as a Basis for a New Standards Framework

October 26, 2011

The 2011 U.S. celebration of World Standards Day took place on October 13 at the Fairmont Hotel in Washington, D.C., with the theme of “Advancing Safety and Sustainability Standards Worldwide.” The evening began with a reception in a hall of exhibits from the celebrations sponsors, which included the National Institute for Standards and Technology (NIST), the Society for Standards Professionals (SES), the American National Standards Institute (ANSI), Microsoft, IEEE, Underwriters Laboratories, the Consumer Electronics Association, ASME, ASTM International, Qualcomm, Techstreet, and many others. Several speakers took the podium after dinner to welcome the 400 or so attendees and to present the World Standards Day Paper Competition Awards and the Ronald H. Brown Standards Leadership Award.

Dr. Patrick Gallagher, Under Secretary of Commerce for Standards and Technology, and Director of NIST, was the first speaker after dinner. He directed his remarks at the value of a decentralized, voluntary, and demand-driven system of standards in promoting innovation and economic prosperity. Gallagher emphasized that “standards provide the common language that keeps domestic and international trade flowing,” concluding that “it is difficult to overestimate their critical value to both the U.S. and global economy.”

James Shannon, President of the National Fire Protection Association (NFPA), accepted the R. H. Brown Standards Leadership Award in recognition for his work initiating or improving the National Electrical Code, the Life Safety Code, and the Fire Safe Cigarette and Residential Sprinkler Campaigns.

Ellen Emard, President of SES, introduced the paper competition award winners. As of this writing the titles and authors of the first and second place awards are not yet available on the SES web site (http://www.ses-standards.org/displaycommon.cfm?an=1&subarticlenbr=56). I took third place for my paper, “What the World Needs Now: A Bold Plan for New Standards.” Where the other winning papers took up traditional engineering issues concerning the role of standards in advancing safety and sustainability issues, my paper spoke to the potential scientific and economic benefits that could be realized by standard metrics and common product definitions for outcomes in education, health care, social services, and environmental resource management. All three of the award-winning papers will appear in a forthcoming issue of Standards Engineering, the journal of SES.

I was coincidentally seated at the dinner alongside Gordon Gillerman, winner of third place in the 2004 paper competition (http://www.ses-standards.org/associations/3698/files/WSD%202004%20-%203%20-%20Gillerman.pdf) and currently Chief of the Standards Services Division at NIST. Gillerman has a broad range of experience in coordinating standards across multiple domains, including environmental protection, homeland security, safety, and health care. Having recently been involved in a workshop focused on measuring, evaluating, and improving the usability of electronic health records (http://www.nist.gov/healthcare/usability/upload/EHR-Usability-Workshop-2011-6-03-2011_final.pdf), Gillerman was quite interested in the potential Rasch measurement techniques hold for reducing data volume with no loss of information, and so for streamlining computer interfaces.

Robert Massof of Johns Hopkins University accompanied me to the dinner, and was seated at a nearby table. Also at Massof’s table were several representatives of the National Institute of Building Sciences, some of whom Massof had recently met at a workshop on adaptations for persons with low vision disabilities. Massof’s work equating the main instruments used for assessing visual function in low vision rehabilitation could lead to a standard metric useful in improving the safety and convenience of buildings.

As is stated in educational materials distributed at the World Standards Day celebration by ANSI, standards are a constant behind-the-scenes presence in nearly all areas of everyday life. Everything from air, water, and food to buildings, clothing, automobiles, roads, and electricity are produced in conformity with voluntary consensus standards of various kinds. In the U.S. alone, more than 100,000 standards specify product and system features and interconnections, making it possible for appliances to tap the electrical grid with the same results no matter where they are plugged in, and for products of all kinds to be purchased with confidence. Life is safer and more convenient, and science and industry are more innovative and profitable, because of standards.

The point of my third-place paper is that life could be even safer and more convenient, and science and industry could be yet more innovative and profitable, if standards and conformity assessment procedures for outcomes in education, health care, social services, and environmental resource management were developed and implemented. Rasch measurement demonstrates the consistent reproducibility of meaningful measures across samples and different collections of construct-relevant items. Within any specific area of interest, then, Rasch measures have the potential of serving as the kind of mediating instruments or objects recognized as essential to the process of linking science with the economy (Fisher & Stenner, 2011b; Hussenot & Missonier, 2010; Miller & O’Leary, 2007). Recent white papers published by NIST and NSF document the challenges and benefits likely to be encountered and produced by initiatives moving in this direction (Fisher, 2009; Fisher & Stenner, 2011a).

A diverse array of Rasch measurement presentations were made at the recent International Measurement Confederation (IMEKO) meeting of metrology engineers in Jena, Germany (see RMT 25 (1), p. 1318). With that start at a new dialogue between the natural and social sciences, the NIST and NSF white papers, and with the award in the World Standards Day paper competition, the U.S. and international standards development communities have shown their interest in exploring possibilities for a new array of standard units of measurement, standardized outcome product definitions, standard conformity assessment procedures, and outcome product quality standards. The increasing acceptance and recognition of the viability of such standards is a logical consequence of observations like these:

  • “Where this law [relating reading ability and text difficulty to comprehension rate] can be applied it provides a principle of measurement on a ratio scale of both stimulus parameters and object parameters, the conceptual status of which is comparable to that of measuring mass and force. Thus…the reading accuracy of a child…can be measured with the same kind of objectivity as we may tell its weight” (Rasch, 1960, p. 115).
  • “Today there is no methodological reason why social science cannot become as stable, as reproducible, and hence as useful as physics” (Wright, 1997, p. 44).
  • “…when the key features of a statistical model relevant to the analysis of social science data are the same as those of the laws of physics, then those features are difficult to ignore” (Andrich, 1988, p. 22).

Rasch’s work has been wrongly assimilated in social science research practice as just another example of the “standard model” of statistical analysis. Rasch measurement rightly ought instead to be treated as a general articulation of the three-variable structure of natural law useful in framing the context of scientific practice. That is, Rasch’s models ought to be employed primarily in calibrating instruments quantitatively interpretable at the point of use in a mathematical language shared by a community of research and practice. To be shared in this way as a universally uniform coin of the realm, that language must be embodied in a consensus standard defining universally uniform units of comparison.

Rasch measurement offers the potential of shifting the focus of quantitative psychosocial research away from data analysis to integrated qualitative and quantitative methods enabling the definition of standard units and the calibration of instruments measuring in that unit. An intangible assets metric system will, in turn, support the emergence of new product- and performance-based standards, management system standards, and personnel certification standards. Reiterating once again Rasch’s (1960, p. xx) insight, we can acknowledge with him that “this is a huge challenge, but once the problem has been formulated it does seem possible to meet it.”

 References

Andrich, D. (1988). Rasch models for measurement. (Vols. series no. 07-068). Sage University Paper Series on Quantitative Applications in the Social Sciences. Beverly Hills, California: Sage Publications.

Fisher, W. P.. Jr. (2009). Metrological infrastructure for human, social, and natural capital (NIST Critical National Need Idea White Paper Series, Retrieved 25 October 2011 from http://www.nist.gov/tip/wp/pswp/upload/202_metrological_infrastructure_for_human_social_natural.pdf). Washington, DC: National Institute for Standards and Technology.

Fisher, W. P., Jr., & Stenner, A. J. (2011a, January). Metrology for the social, behavioral, and economic sciences (Social, Behavioral, and Economic Sciences White Paper Series). Retrieved 25 October 2011 from http://www.nsf.gov/sbe/sbe_2020/submission_detail.cfm?upld_id=36. Washington, DC: National Science Foundation.

Fisher, W. P., Jr., & Stenner, A. J. (2011b). A technology roadmap for intangible assets metrology. In Fundamentals of measurement science. International Measurement Confederation (IMEKO), Jena, Germany, August 31 to September 2.

Hussenot, A., & Missonier, S. (2010). A deeper understanding of evolution of the role of the object in organizational process. The concept of ‘mediation object.’ Journal of Organizational Change Management, 23(3), 269-286.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-34.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Reimagining Capitalism Again, Part III: Reflections on Greider’s “Bold Ideas” in The Nation

September 10, 2011

And so, The Nation’s “Bold Ideas for a New Economy” is disappointing for not doing more to start from the beginning identified by its own writer, William Greider. The soul of capitalism needs to be celebrated and nourished, if we are to make our economy “less destructive and domineering,” and “more focused on what people really need for fulfilling lives.” The only real alternative to celebrating and nourishing the soul of capitalism is to kill it, in the manner of the Soviet Union’s failed experiments in socialism and communism.

The article speaks the truth, though, when it says there is no point in trying to persuade the powers that be to make the needed changes. Republicans see the market as it exists as a one-size-fits-all economic panacea, when all it can accomplish in its current incomplete state is the continuing externalization of anything and everything important about human, social, and environmental decency. For their part, Democrats do indeed “insist that regulation will somehow fix whatever is broken,” in an ever-expanding socialistic micromanagement of every possible exception to the rules that emerges.

To date, the president’s efforts at a nonpartisan third way amount only to vacillations between these opposing poles. The leadership that is needed, however, is something else altogether. Yes, as The Nation article says, capitalism needs to be made to serve the interests of society, and this will require deep structural change, not just new policies. But none of the contributors of the “bold ideas” presented propose deep structural changes of a kind that actually gets at the soul of capitalism. All of the suggestions are ultimately just new policies tweaking superficial aspects of the economy in mechanical, static, and very limited ways.

The article calls for “Democratizing reforms that will compel business and finance to share decision-making and distribute rewards more fairly.” It says the vision has different names but “the essence is a fundamental redistribution of power and money.” But corporate distortions of liability law, the introduction of boardroom watchdogs, and a tax on financial speculation do not by any stretch of the imagination address the root causes of social and environmental irresponsibility in business. They “sound like obscure technical fixes” because that’s what they are. The same thing goes for low-cost lending from public banks, the double or triple bottom lines of Benefit Corporations, new anti-trust laws, calls for “open information” policies, added personal stakes for big-time CEOs, employee ownership plans, the elimination of tax subsidies for, new standards for sound investing, new measures of GDP, and government guarantees of full employment.

All of these proposals sound like what ought to be the effects and outcomes of efforts addressing the root causes of capitalisms’ shortcomings. Instead, they are band aids applied to scratched fingers and arms when multiple by-pass surgery is called for. That is, what we need is to understand how to bring the spirit of capitalism to life in the new domains of human, social, and environmental interests, but what we’re getting are nothing but more of the same piecemeal ways of moving around the deck chairs on the Titanic.

There is some truth in the assertion that what really needs reinventing is our moral and spiritual imagination. As someone (Einstein or Edison?) is supposed to have put it, originality is simply a matter of having a source for an analogy no one else has considered. Ironically, the best model is often the one most taken for granted and nearest to hand. Such is the case with the two-sided scientific and economic effects of standardized units of measurement. The fundamental moral aspect here is nothing other than the Golden Rule, independently derived and offered in cultures throughout history, globally. Individualized social measurement is nothing if not a matter of determining whether others are being treated in the way you yourself would want to be treated.

And so, yes, to stress the major point of agreement with The Nation, “the new politics does not start in Washington.” Historically, at their best, governments work to keep pace with the social and technical innovations introduced by their peoples. Margaret Mead said it well a long time ago when she asserted that small groups of committed citizens are the only sources of real social change.

Not to be just one of many “advocates with bold imaginations” who wind up marginalized by the constraints of status quo politics, I claim my personal role in imagining a new economic future by tapping as deeply as I can into the positive, pre-existing structures needed for a transition into a new democratic capitalism. We learn through what we already know. Standards are well established as essential to commerce and innovation, but 90% of the capital under management in our economy—the human, social, and natural capital—lacks the standards needed for optimal market efficiency and effectiveness. An intangible assets metric system will be a vitally important way in which we extend what is right and good in the world today into new domains.

To conclude, what sets this proposal apart from those offered by The Nation and its readers hinges on our common agreement that “the most threatening challenge to capitalism is arguably the finite carrying capacity of the natural world.” The bold ideas proposed by The Nation’s readers respond to this challenge in ways that share an important feature in common: people have to understand the message and act on it. That fact dooms all of these ideas from the start. If we have to articulate and communicate a message that people then have to act on, we remain a part of the problem and not part of the solution.

As I argue in my “The Problem is the Problem” blog post of some months ago, this way of defining problems is itself the problem. That is, we can no longer think of ourselves as separate from the challenges we face. If we think we are not all implicated through and through as participants in the construction and maintenance of the problem, then we have not understood it. The bold ideas offered to date are all responses to the state of a broken system that seek to reform one or another element in the system when what we need is a whole new system.

What we need is a system that so fully embodies nature’s own ecological wisdom that the medium becomes the message. When the ground rules for economic success are put in place such that it is impossible to earn a profit without increasing stocks of human, social, and natural capital, there will be no need to spell out the details of a microregulatory structure of controlling new anti-trust laws, “open information” policies, personal stakes for big-time CEOs, employee ownership plans, the elimination of tax subsidies, etc. What we need is precisely what Greider reported from Innovest in his book: reliable, high quality information that makes human, social, and environmental issues matter financially. Situated in a context like that described by Bernstein in his 2004 The Birth of Plenty, with the relevant property rights, rule of law, scientific rationality, capital markets, and communications networks in place, it will be impossible to stop a new economic expansion of historic proportions.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Reimagining Capitalism Again, Part I: Reflections on Greider’s Soul of Capitalism

September 10, 2011

In his 2003 book, The Soul of Capitalism, William Greider wrote, “If capitalism were someday found to have a soul, it would probably be located in the mystic qualities of capital itself” (p. 94). The recurring theme in the book is that the resolution of capitalism’s deep conflicts must grow out as organic changes from the roots of capitalism itself.

In the book, Greider quotes Innovest’s Michael Kiernan as suggesting that the goal has to be re-engineering the DNA of Wall Street (p. 119). He says the key to doing this is good reliable information that has heretofore been unavailable but which will make social and environmental issues matter financially. The underlying problems of exactly what solid, high quality information looks like, where it comes from, and how it is created are not stated or examined, but the point, as Kiernan says, is that “the markets are pretty good at punishing and rewarding.” The objective is to use “the financial markets as an engine of reform and positive change rather than destruction.”

This objective is, of course, the focus of multiple postings in this blog (see especially this one and this one). From my point of view, capitalism indeed does have a soul and it is actually located in the qualities of capital itself. Think about it: if a soul is a spirit of something that exists independent of its physical manifestation, then the soul of capitalism is the fungibility of capital. Now, this fungibility is complex and ambiguous. It takes its strength and practical value from the way market exchange are represented in terms of currencies, monetary units that, within some limits, provide an objective basis of comparison useful for rewarding those capable of matching supply with demand.

But the fungibility of capital can also be dangerously misconceived when the rich complexity and diversity of human capital is unjustifiably reduced to labor, when the irreplaceable value of natural capital is unjustifiably reduced to land, and when the trust, loyalty, and commitment of social capital is completely ignored in financial accounting and economic models. As I’ve previously said in this blog, the concept of human capital is inherently immoral so far as it reduces real human beings to interchangeable parts in an economic machine.

So how could it ever be possible to justify any reduction of human, social, and natural value to a mere number? Isn’t this the ultimate in the despicable inhumanity of economic logic, corporate decision making, and, ultimately, the justification of greed? Many among us who profess liberal and progressive perspectives seem to have an automatic and reactionary prejudice of this kind. This makes these well-intentioned souls as much a part of the problem as those among us with sometimes just as well-intentioned perspectives that accept such reductionism as the price of entry into the game.

There is another way. Human, social, and natural value can be measured and made manageable in ways that do not necessitate totalizing reduction to a mere number. The problem is not reduction itself, but unjustified, totalizing reduction. Referring to all people as “man” or “men” is an unjustified reduction dangerous in the way it focuses attention only on males. The tendency to think and act in ways privileging males over females that is fostered by this sense of “man” shortchanges us all, and has happily been largely eliminated from discourse.

Making language more inclusive does not, however, mean that words lose the singular specificity they need to be able to refer to things in the world. Any given word represents an infinite population of possible members of a class of things, actions, and forms of life. Any simple sentence combining words into a coherent utterance then multiplies infinities upon infinities. Discourse inherently reduces multiplicities into texts of limited lengths.

Like any tool, reduction has its uses. Also like any tool, problems arise when the tool is allowed to occupy some hidden and unexamined blind spot from which it can dominate and control the way we think about everything. Critical thinking is most difficult in those instances in which the tools of thinking themselves need to be critically evaluated. To reject reduction uncritically as inherently unjustified is to throw the baby out with the bathwater. Indeed, it is impossible to formulate a statement of the rejection without simultaneously enacting exactly what is supposed to be rejected.

We have numerous ready-to-hand examples of how all reduction has been unjustifiably reduced to one homogenized evil. But one of the results of experiments in communal living in the 1960s and 1970s, as well as of the fall of the Soviet Union, was the realization that the centralized command and control of collectively owned community property cannot compete with the creativity engendered when individuals hold legal title to the fruits of their labors. If individuals cannot own the results of the investments they make, no one makes any investments.

In other words, if everything is owned collectively and is never reduced to individually possessed shares that can be creatively invested for profitable returns, then the system is structured so as to punish innovation and reward doing as little as possible. But there’s another way of thinking about the relation of the collective to the individual. The living soul of capitalism shows itself in the way high quality information makes it possible for markets to efficiently coordinate and align individual producers’ and consumers’ collective behaviors and decisions. What would happen if we could do that for human, social, and natural capital markets? What if “social capitalism” is more than an empty metaphor? What if capital institutions can be configured so that individual profit really does become the driver of socially responsible, sustainable economics?

And here we arrive at the crux of the problem. How do we create the high quality, solid information markets need to punish and reward relative to ethical and sustainable human, social, and environmental values? Well, what can we learn from the way we created that kind of information for property and manufactured capital? These are the questions taken up and explored in the postings in this blog, and in my scientific research publications and meeting presentations. In the near future, I’ll push my reflection on these questions further, and will explore some other possible answers to the questions offered by Greider and his readers in a recent issue of The Nation.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Consequences of Standardized Technical Effects for Scientific Advancement

January 24, 2011

Note. This is modified from:

Fisher, W. P., Jr. (2004, Wednesday, January 21). Consequences of standardized technical effects for scientific advancement. In  A. Leplège (Chair), Session 2.5A. Rasch Models: History and Philosophy. Second International Conference on Measurement in Health, Education, Psychology, and Marketing: Developments with Rasch Models, The International Laboratory for Measurement in the Social Sciences, School of Education, Murdoch University, Perth, Western Australia.

—————————

Over the last several decades, historians of science have repeatedly produced evidence contradicting the widespread assumption that technology is a product of experimentation and/or theory (Kuhn 1961; Latour 1987; Rabkin 1992; Schaffer 1992; Hankins & Silverman 1999; Baird 2002). Theory and experiment typically advance only within the constraints set by a key technology that is widely available to end users in applied and/or research contexts. Thus, “it is not just a clever historical aphorism, but a general truth, that ‘thermodynamics owes much more to the steam engine than ever the steam engine owed to thermodynamics’” (Price 1986, p. 240).

The prior existence of the relevant technology comes to bear on theory and experiment again in the common, but mistaken, assumption that measures are made and experimentally compared in order to discover scientific laws. History and the logic of measurement show that measures are rarely made until the relevant law is effectively embodied in an instrument (Kuhn 1961; Michell 1999). This points to the difficulty experienced in metrologically fusing (Schaffer 1992, p. 27; Lapré & van Wassenhove 2002) instrumentalists’ often inarticulate, but materially effective, knowledge (know-how) with theoreticians’ often immaterial, but well articulated, knowledge (know-why) (Galison 1999; Baird 2002).

Because technology often dictates what, if any, phenomena can be consistently produced, it constrains experimentation and theorizing by focusing attention selectively on reproducible, potentially interpretable effects, even when those effects are not well understood (Ackermann 1985; Daston & Galison 1992; Ihde 1998; Hankins & Silverman 1999; Maasen & Weingart 2001). Criteria for theory choice in this context stem from competing explanatory frameworks’ experimental capacities to facilitate instrument improvements, prediction of experimental results, and gains in the efficiency with which a phenomenon is produced.

In this context, the relatively recent introduction of measurement models requiring additive, invariant parameterizations (Rasch 1960) provokes speculation as to the effect on the human sciences that might be wrought by the widespread availability of consistently reproducible effects expressed in common quantitative languages. Paraphrasing Price’s comment on steam engines and thermodynamics, might it one day be said that as yet unforeseeable advances in reading theory will owe far more to the Lexile analyzer (Burdick & Stenner 1996) than ever the Lexile analyzer owed reading theory?

Kuhn (1961) speculated that the second scientific revolution of the mid-nineteenth century followed in large part from the full mathematization of physics, i.e., the emergence of metrology as a professional discipline focused on providing universally accessible uniform units of measurement (Roche 1998). Might a similar revolution and new advances in the human sciences follow from the introduction of rigorously mathematical uniform measures?

Measurement technologies capable of supporting the calibration of additive units that remain invariant over instruments and samples (Rasch 1960) have been introduced relatively recently in the human sciences. The invariances produced appear 1) very similar to those produced in the natural sciences (Fisher 1997) and 2) based in the same mathematical metaphysics as that informing the natural sciences (Fisher 2003). Might then it be possible that the human sciences are on the cusp of a revolution analogous to that of nineteenth century physics? Other factors involved in answering this question, such as the professional status of the field, the enculturation of students, and the scale of the relevant enterprises, define the structure of circumstances that might be capable of supporting the kind of theoretical consensus and research productivity that came to characterize, for instance, work in electrical resistance through the early 1880s (Schaffer 1992).

Much could be learned from Rasch’s use of Maxwell’s method of analogy (Nersessian, 2002; Turner, 1955), not just in the modeling of scientific laws but from the social and economic factors that made the regularities of natural phenomena function as scientific capital (Latour, 1987). Quantification must be understood in the fully mathematical sense of commanding a comprehensive grasp of the real root of mathematical thinking. Far from being simply a means of producing numbers, to be useful, quantification has to result in qualitatively transparent figure-meaning relations at any point of use for any one of every different kind of user. Connections between numbers and unit amounts of the variable must remain constant across samples, instruments, time, space, and measurers. Quantification that does not support invariant linear comparisons expressed in a uniform metric available universally to all end users at the point of need is inadequate and incomplete. Such standardization is widely respected in the natural sciences but is virtually unknown in the human sciences, largely due to untested hypotheses and unexamined prejudices concerning the viability of universal uniform measures for the variables measured via tests, surveys, and performance assessments.

Quantity is an effective medium for science to the extent that it comprises an instance of the kind of common language necessary for distributed, collective thinking; for widespread agreement on what makes research results compelling; and for the formation of social capital’s group-level effects. It may be that the primary relevant difference between the case of 19th century physics and today’s human sciences concerns the awareness, widespread among scientists in the 1800s and virtually nonexistent in today’s human sciences, that universal uniform metrics for the variables of interest are both feasible and of great human, scientific, and economic value.

In the creative dynamics of scientific instrument making, as in the making of art, the combination of inspiration and perspiration can sometimes result in cultural gifts of the first order. It nonetheless often happens that some of these superlative gifts, no matter how well executed, are unable to negotiate the conflict between commodity and gift economics characteristic of the marketplace (Baird, 1997; Hagstrom, 1965; Hyde, 1979), and so remain unknown, lost to the audiences they deserve, and unable to render their potential effects historically. Value is not an intrinsic characteristic of the gift; rather, value is ascribed as a function of interests. If interests are not cultivated via the clear definition of positive opportunities for self-advancement, common languages, socio-economic relations, and recruitment, gifts of even the greatest potential value may die with their creators. On the other hand, who has not seen mediocrity disproportionately rewarded merely as a result of intensive marketing?

A central problem is then how to strike a balance between individual or group interests and the public good. Society and individuals are interdependent in that children are enculturated into the specific forms of linguistic and behavioral competence that are valued in communities at the same time that those communities are created, maintained, and reproduced through communicative actions (Habermas, 1995, pp. 199-200). The identities of individuals and societies then co-evolve, as each defines itself through the other via the medium of language. Language is understood broadly in this context to include all perceptual reading of the environment, bodily gestures, social action, etc., as well as the use of spoken or written symbols and signs (Harman, 2005; Heelan, 1983; Ihde, 1998; Nicholson, 1984; Ricoeur, 1981).

Technologies extend language by providing media for the inscription of new kinds of signs (Heelan, 1983a, 1998; Ihde, 1991, 1998; Ihde & Selinger, 2003). Thus, mobility desires and practices are inscribed and projected into the world using the automobile; shelter and life style, via housing and clothing; and communications, via alphabets, scripts, phonemes, pens and paper, telephones, and computers. Similarly, technologies in the form of test, survey, and assessment instruments provide the devices on which we inscribe desires for social mobility, career advancement, health maintenance and improvement, etc.

References

Ackermann, J. R. (1985). Data, instruments, and theory: A dialectical approach to understanding science. Princeton, New Jersey: Princeton University Press.

Baird, D. (1997, Spring-Summer). Scientific instrument making, epistemology, and the conflict between gift and commodity economics. Techné: Journal of the Society for Philosophy and Technology, 2(3-4), 25-46. Retrieved 08/28/2009, from http://scholar.lib.vt.edu/ejournals/SPT/v2n3n4/baird.html.

Baird, D. (2002, Winter). Thing knowledge – function and truth. Techné: Journal of the Society for Philosophy and Technology, 6(2). Retrieved 19/08/2003, from http://scholar.lib.vt.edu/ejournals/SPT/v6n2/baird.html.

Burdick, H., & Stenner, A. J. (1996). Theoretical prediction of test items. Rasch Measurement Transactions, 10(1), 475 [http://www.rasch.org/rmt/rmt101b.htm].

Daston, L., & Galison, P. (1992, Fall). The image of objectivity. Representations, 40, 81-128.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York, New York: Routledge.

Habermas, J. (1995). Moral consciousness and communicative action. Cambridge, Massachusetts: MIT Press.

Hagstrom, W. O. (1965). Gift-giving as an organizing principle in science. The Scientific Community. New York: Basic Books, pp. 12-22. (Rpt. in B. Barnes, (Ed.). (1972). Sociology of science: Selected readings (pp. 105-20). Baltimore, Maryland: Penguin Books.

Hankins, T. L., & Silverman, R. J. (1999). Instruments and the imagination. Princeton, New Jersey: Princeton University Press.

Harman, G. (2005). Guerrilla metaphysics: Phenomenology and the carpentry of things. Chicago: Open Court.

Hyde, L. (1979). The gift: Imagination and the erotic life of property. New York: Vintage Books.

Ihde, D. (1998). Expanding hermeneutics: Visualism in science. Northwestern University Studies in Phenomenology and Existential Philosophy). Evanston, Illinois: Northwestern University Press.

Kuhn, T. S. (1961). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. in The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago, Illinois: University of Chicago Press (Original work published 1977).

Lapré, M. A., & Van Wassenhove, L. N. (2002, October). Learning across lines: The secret to more efficient factories. Harvard Business Review, 80(10), 107-11.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York, New York: Cambridge University Press.

Maasen, S., & Weingart, P. (2001). Metaphors and the dynamics of knowledge. (Vol. 26. Routledge Studies in Social and Political Thought). London: Routledge.

Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept. Cambridge: Cambridge University Press.

Nersessian, N. J. (2002). Maxwell and “the Method of Physical Analogy”: Model-based reasoning, generic abstraction, and conceptual change. In D. Malament (Ed.), Essays in the history and philosophy of science and mathematics (pp. 129-166). Lasalle, Illinois: Open Court.

Price, D. J. d. S. (1986). Of sealing wax and string. In Little Science, Big Science–and Beyond (pp. 237-253). New York, New York: Columbia University Press. p. 240:

Rabkin, Y. M. (1992). Rediscovering the instrument: Research, industry, and education. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 57-82). Bellingham, Washington: SPIE Optical Engineering Press.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roche, J. (1998). The mathematics of measurement: A critical history. London: The Athlone Press.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Turner, J. (1955, November). Maxwell on the method of physical analogy. British Journal for the Philosophy of Science, 6, 226-238.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Newton, Metaphysics, and Measurement

January 20, 2011

Though Newton claimed to deduce quantitative propositions from phenomena, the record shows that he brought a whole cartload of presuppositions to bear on his observations (White, 1997), such as his belief that Pythagoras was the discoverer of the inverse square law, his knowledge of Galileo’s freefall experiments, and his theological and astrological beliefs in occult actions at a distance. Without his immersion in this intellectual environment, he likely would not have been able to then contrive the appearance of deducing quantity from phenomena.

The second edition of the Principia, in which appears the phrase “hypotheses non fingo,” was brought out in part to respond to the charge that Newton had not offered any explanation of what gravity is. De Morgan, in particular, felt that Newton seemed to know more than he could prove (Keynes, 1946). But in his response to the critics, and in asserting that he feigns no hypotheses, Newton was making an important distinction between explaining the causes or composition of gravity and describing how it works. Newton was saying he did not rely on or make or test any hypotheses as to what gravity is; his only concern was with how it behaves. In due course, gravity came to be accepted as a fundamental feature of the universe in no need of explanation.

Heidegger (1977, p. 121) contends that Newton was, as is implied in the translation “I do not feign hypotheses,” saying in effect that the ground plan he was offering as a basis for experiment and practical application was not something he just made up. Despite Newton’s rejection of metaphysical explanations, the charge of not explaining gravity for what it is was being answered with a metaphysics of how, first, to derive the foundation for a science of precise predictive control from nature, and then resituate that foundation back within nature as an experimental method incorporating a mathematical plan or model. This was, of course, quite astute of Newton, as far as he went, but he stopped far short of articulating the background assumptions informing his methods.

Newton’s desire for a logic of experimental science led him to reject anything “metaphysical or physical, or based on occult qualities, or mechanical” as a foundation for proceeding. Following in Descartes’ wake, Newton then was satisfied to solidify the subject-object duality and to move forward on the basis of objective results that seemed to make metaphysics a thing of the past. Unfortunately, as Burtt (1954/1932, pp. 225-230) observes in this context, the only thing that can possibly happen when you presume discourse to be devoid of metaphysical assumptions is that your metaphysics is more subtly insinuated and communicated to others because it is not overtly presented and defended. Thus we have the history of logical positivism as the dominant philosophy of science.

It is relevant to recall here that Newton was known for strong and accurate intuitions, and strong and unorthodox religious views (he held the Lucasian Chair at Cambridge only by royal dispensation, as he was not Anglican). It must be kept in mind that Newton’s combination of personal characteristics was situated in the social context of the emerging scientific culture’s increasing tendency to prioritize results that could be objectively detached from the particular people, equipment, samples, etc. involved in their production (Shapin, 1989). Newton then had insights that, while remarkably accurate, could not be entirely derived from the evidence he offered and that, moreover, could not acceptably be explained informally, psychologically, or theologically.

What is absolutely fascinating about this constellation of factors is that it became a model for the conduct of science. Of course, Newton’s laws of motion were adopted as the hallmark of successful scientific modeling in the form of the Standard Model applied throughout physics in the nineteenth century (Heilbron, 1993). But so was the metaphysical positivist logic of a pure objectivism detached from everything personal, intuitive, metaphorical, social, economic, or religious (Burtt, 1954/1932).

Kuhn (1970) made a major contribution to dismantling this logic when he contrasted textbook presentations of the methodical production of scientific effects with the actual processes of cobbled-together fits and starts that are lived out in the work of practicing scientists. But much earlier, James Clerk Maxwell (1879, pp. 162-163) had made exactly the same observation in a contrast of the work of Ampere with that of Faraday:

“The experimental investigation by which Ampere established the laws of the mechanical action between electric currents is one of the most brilliant achievements in science. The whole, theory and experiment, seems as if it had leaped, full grown and full armed, from the brain of the ‘Newton of electricity.’ It is perfect in form, and unassailable in accuracy, and it is summed up in a formula from which all the phenomena may be deduced, and which must always remain the cardinal formula of electro-dynamics.

“The method of Ampere, however, though cast into an inductive form, does not allow us to trace the formation of the ideas which guided it. We can scarcely believe that Ampere really discovered the law of action by means of the experiments which he describes. We are led to suspect, what, indeed, he tells us himself* [Ampere’s Theorie…, p. 9], that he discovered the law by some process which he has not shewn us, and that when he had afterwards built up a perfect demonstration he removed all traces of the scaffolding by which he had raised it.

“Faraday, on the other hand, shews us his unsuccessful as well as his successful experiments, and his crude ideas as well as his developed ones, and the reader, however inferior to him in inductive power, feels sympathy even more than admiration, and is tempted to believe that, if he had the opportunity, he too would be a discoverer. Every student therefore should read Ampere’s research as a splendid example of scientific style in the statement of a discovery, but he should also study Faraday for the cultivation of a scientific spirit, by means of the action and reaction which will take place between newly discovered facts and nascent ideas in his own mind.”

Where does this leave us? In sum, Rasch emulated Ampere in two ways. He did so first in wanting to become the “Newton of reading,” or even the “Newton of psychosocial constructs,” when he sought to show that data from reading test items and readers are structured with an invariance analogous to that of data from instruments applying a force to an object with mass (Rasch, 1960, pp. 110-115). Rasch emulated Ampere again when, like Ampere, after building up a perfect demonstration of a reading law structured in the form of Newton’s second law, he did not report the means by which he had constructed test items capable of producing the data fitting the model, effectively removing all traces of the scaffolding.

The scaffolding has been reconstructed for reading (Stenner, et al., 2006) and has also been left in plain view by others doing analogous work involving other constructs (cognitive and moral development, mathematics ability, short-term memory, etc.). Dawson (2002), for instance, compares developmental scoring systems of varying sophistication and predictive control. And it may turn out that the plethora of uncritically applied Rasch analyses may turn out to be a capital resource for researchers interested in focusing on possible universal laws, predictive theories, and uniform metrics.

That is, published reports of calibration, error, and fit estimates open up opportunities for “pseudo-equating” (Beltyukova, Stone, & Fox, 2004; Fisher 1997, 1999) in their documentation of the invariance, or lack thereof, of constructs over samples and instruments. The evidence will point to a need for theoretical and metric unification directly analogous to what happened in the study and use of electricity in the nineteenth century:

“…’the existence of quantitative correlations between the various forms of energy, imposes upon men of science the duty of bringing all kinds of physical quantity to one common scale of comparison.’” [Schaffer, 1992, p. 26; quoting Everett 1881; see Smith & Wise 1989, pp. 684-4]

Qualitative and quantitative correlations in scaling results converged on a common construct in the domain of reading measurement through the 1960s and 1970s, culminating in the Anchor Test Study and the calibration of the National Reference Scale for Reading (Jaeger, 1973; Rentz & Bashaw, 1977). The lack of a predictive theory and the entirely empirical nature of the scale estimates prevented the scale from wide application, as the items in the tests that were equated were soon replaced with new items.

But the broad scale of the invariance observed across tests and readers suggests that some mechanism must be at work (Stenner, Stone, & Burdick, 2009), or that some form of life must be at play (Fisher, 2003a, 2003b, 2004, 2010a), structuring the data. Eventually, some explanation accounting for the structure ought to become apparent, as it did for reading (Stenner, Smith, & Burdick, 1983; Stenner, et al., 2006). This emergence of self-organizing structures repeatedly asserting themselves as independently existing real things is the medium of the message we need to hear. That message is that instruments play a very large and widely unrecognized role in science. By facilitating the routine production of mutually consistent, regularly observable, and comparable results they set the stage for theorizing, the emergence of consensus on what’s what, and uniform metrics (Daston & Galison, 2007; Hankins & Silverman, 1999; Latour, 1987, 2005; Wise, 1988, 1995). The form of Rasch’s models as extensions of Maxwell’s method of analogy (Fisher, 2010b) makes them particularly productive as a means of providing self-organizing invariances with a medium for their self-inscription. But that’s a story for another day.

References

Beltyukova, S. A., Stone, G. E., & Fox, C. M. (2004). Equating student satisfaction measures. Journal of Applied Measurement, 5(1), 62-9.

Burtt, E. A. (1954/1932). The metaphysical foundations of modern physical science (Rev. ed.) [First edition published in 1924]. Garden City, New York: Doubleday Anchor.

Daston, L., & Galison, P. (2007). Objectivity. Cambridge, MA: MIT Press.

Dawson, T. L. (2002, Summer). A comparison of three developmental stage scoring systems. Journal of Applied Measurement, 3(2), 146-89.

Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Jr. (1999). Foundations for health status metrology: The stability of MOS SF-36 PF-10 calibrations across samples. Journal of the Louisiana State Medical Society, 151(11), 566-578.

Fisher, W. P., Jr. (2003a, December). Mathematics, measurement, metaphor, metaphysics: Part I. Implications for method in postmodern science. Theory & Psychology, 13(6), 753-90.

Fisher, W. P., Jr. (2003b, December). Mathematics, measurement, metaphor, metaphysics: Part II. Accounting for Galileo’s “fateful omission.” Theory & Psychology, 13(6), 791-828.

Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.

Fisher, W. P., Jr. (2010a). Reducible or irreducible? Mathematical reasoning and the ontological method. Journal of Applied Measurement, 11(1), 38-59.

Fisher, W. P., Jr. (2010b). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics: Conference Series, 238(1), http://iopscience.iop.org/1742-6596/238/1/012016/pdf/1742-6596_238_1_012016.pdf.

Hankins, T. L., & Silverman, R. J. (1999). Instruments and the imagination. Princeton, New Jersey: Princeton University Press.

Jaeger, R. M. (1973). The national test equating study in reading (The Anchor Test Study). Measurement in Education, 4, 1-8.

Keynes, J. M. (1946, July). Newton, the man. (Speech given at the Celebration of the Tercentenary of Newton’s birth in 1642.) MacMillan St. Martin’s Press (London, England), The Collected Writings of John Maynard Keynes Volume X, 363-364.

Kuhn, T. S. (1970). The structure of scientific revolutions. Chicago, Illinois: University of Chicago Press.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York: Cambridge University Press.

Latour, B. (2005). Reassembling the social: An introduction to Actor-Network-Theory. (Clarendon Lectures in Management Studies). Oxford, England: Oxford University Press.

Maxwell, J. C. (1879). Treatise on electricity and magnetism, Volumes I and II. London, England: Macmillan.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Rentz, R. R., & Bashaw, W. L. (1977, Summer). The National Reference Scale for Reading: An application of the Rasch model. Journal of Educational Measurement, 14(2), 161-179.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Shapin, S. (1989, November-December). The invisible technician. American Scientist, 77, 554-563.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Stenner, A. J., Smith, M., III, & Burdick, D. S. (1983, Winter). Toward a theory of construct definition. Journal of Educational Measurement, 20(4), 305-316.

Stenner, A. J., Stone, M., & Burdick, D. (2009, Autumn). The concept of a measurement mechanism. Rasch Measurement Transactions, 23(2), 1204-1206.

White, M. (1997). Isaac Newton: The last sorcerer. New York: Basic Books.

Wise, M. N. (1988). Mediating machines. Science in Context, 2(1), 77-113.

Wise, M. N. (Ed.). (1995). The values of precision. Princeton, New Jersey: Princeton University Press.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Geometrical and algebraic expressions of scientific laws

April 12, 2010

Geometry provides a model of scientific understanding that has repeatedly proven itself over the course of history. Einstein (1922) considered geometry to be “the most ancient branch of physics” (p. 14). He accorded “special importance” to his view that “all linear measurement in physics is practical geometry,” “because without it I should have been unable to formulate the theory of relativity” (p. 14).

Burtt (1954) concurs, pointing out that the essential question for Copernicus was not “Does the earth move?” but, rather, “…what motions should we attribute to the earth in order to obtain the simplest and most harmonious geometry of the heavens that will accord with the facts?” (p. 39). Maxwell similarly employed a geometrical analogy in working out his electromagnetic theory, saying

“By referring everything to the purely geometrical idea of the motion of an imaginary fluid, I hope to attain generality and precision, and to avoid the dangers arising from a premature theory professing to explain the cause of the phenomena. If the results of mere speculation which I have collected are found to be of any use to experimental philosophers, in arranging and interpreting their results, they will have served their purpose, and a mature theory, in which physical facts will be physically explained, will be formed by those who by interrogating Nature herself can obtain the only true solution of the questions which the mathematical theory suggests.” (Maxwell, 1965/1890, p. 159).

Maxwell was known for thinking visually, once as a student offering a concise geometrical solution to a problem that resisted a lecturer’s lengthy algebraic efforts (Forfar, 2002, p. 8). His approach seemed to be one of playing with images with the aim of arriving at simple mathematical representations, instead of thinking linearly through a train of analysis. A similar method is said to have been used by Einstein (Holton, 1988, pp. 385-388).

Gadamer (1980) speaks of the mathematical transparency of geometric figures to convey Plato’s reasons for requiring mathematical training of the students in his Academy, saying:

“Geometry requires figures which we draw, but its object is the circle itself…. Even he who has not yet seen all the metaphysical implications of the concept of pure thinking but only grasps something of mathematics—and as we know, Plato assumed that such was the case with his listeners—even he knows that in a manner of speaking one looks right through the drawn circle and keeps the pure thought of the circle in mind.” (p. 101)

But exactly how do geometrical visualizations lend themselves to algebraic formulae? More specifically, is it possible to see the algebraic structure of scientific laws in geometry?

Yes, it is. Here’s how. Starting from the Pythagorean theorem, we know that the square of a right triangle’s hypotenuse is equal to the sum of the squares of the other two sides. For convenience, imagine that the lengths of the sides of the triangle, as shown in Figure 1, are 3, 4, and 5, for sides a, b, and c, respectively. We can count the unit squares within each side’s square and see that the 25 in the square of the hypotenuse equal the sum of the 9 in the square of side a and the 16 in the sum of side b.

That mathematical relationship can, of course, be written as

a2 + b2 = c2

which, for Figure 1, is

32 + 42 = 52 = 9 + 16 = 25

Now, most scientific laws are written in a multiplicative form, like this:

m = f / a

or

f = m * a

which, of course, is how Maxwell presented Newton’s Second Law. So how would the Pythagorean Theorem be written like a physical law?

Since the advent of small, cheap electronic calculators, slide rules have fallen out of fashion. But these eminently useful tools are built to take advantage of the way the natural logarithm and the number e (2.71828…) make division interchangeable with subtraction, and multiplication interchangeable with addition.

That means the Pythagorean Theorem could be written like Newton’s Second Law of Motion, or the Combined Gas Law. Here’s how it works. The Pythagorean Theorem is normally written as

a2 + b2 = c2

but does it make sense to write it as follows?

a2 * b2 = c2

Using the convenient values for a, b, and c from above

32 + 42 = 52

and

9 + 16 = 25

so, plainly, simply changing the plus sign to a multiplication sign will not work, since 9 * 16 is 144. This is where the number e comes in. What happens if e is taken as a base raised to the power of each of the parameters in the equation? Does this equation work?

e9 * e16 = e25

which, substituting a for e9, b for e16, and c for e25, could be represented by

a * b = c

and which could be solved as

8103 * 8,886,015 ≈ 72,003,378,611

Yes, it works, and so it is possible to divide through by e16 and arrive at the form of the law used by Maxwell and Rasch:

8103 ≈ 72,003,378,611 / 8,886,015

or

e9 = e25 / e16

or, again substituting a for e9, b for e16, and c for e25, could be represented by

a = c / b

which, when converted back to the additive form, looks like this:

a = c – b

and this

9 = 25 – 16 .

Rasch wrote his model in the multiplicative form of

εvi = θvσi

and it is often written in the form of

Pr {Xni = 1} = eβnδi / 1 + eβnδi

or

Pni = exp(Bn – Di) / [1 + exp(Bn – Di)]

which is to say that the probability of a correct response from person n on item i is equal to e taken to the power of the difference between the estimate β (or B) of person n‘s ability and the estimate δ (or D) of item i‘s difficulty, divided by one plus e to that same power.

Logit estimates of Rasch model parameters taken straight from software output usually range between ­-3.0 or so and 3.0. So what happens if a couple of arbitrary values are plugged into these equations? If someone has a measure of 2 logits, what is their probability of a correct answer on an item that calibrates at 0.5 logits? The answer should be

e2-0.5 / (1 + e2-0.5).

Now,

e1.5 = 2.718281.5 = 4.481685….

and

4.481685 / (1 + 4.481685) ≈ 0.8176

For a table of the relationships between logit differences, odds, and probabilities, see Table 1.4.1 in Wright & Stone (1979, p. 16), or Table 1 in Wright (1977).

This form of the model

Pni = exp(Bn – Di) / [1 + exp(Bn – Di)]

can be rewritten in an equivalent form as

[Pni / (1 – Pni)] = exp(Bn – Di) .

Taking the natural logarithm of the response probabilities expresses the model in perhaps its most intuitive form, often written as

ln[Pni / (1 – Pni)] = Bn – Di .

Substituting a for ln[Pni / (1 – Pni)], b for Bn, and c for Di, we have the same equation as we had for the Pythagorean Theorem, above

a = c – b .

Plugging in the same values of 2.0 and 0.5 logits for Bn and Di,

ln[Pni / (1 – Pni)] = 2.0 – 0.5 = 1.5.

The logit value of 1.5 is obtained from response odds [Pni / (1 – Pni)] of about 4.5, making, again, Pni equal to about 0.82.

Rasch wrote the model in working from Maxwell like this:

Avj = Fj / Mv .

So when catapult j’s force F of 50 Newtons (361.65 poundals) is applied to object v’s mass M of 10 kilograms (22.046 pounds), the acceleration of this interaction is 5 meters (16.404 feet) per second, per second. Increases in force relative to the same mass result in proportionate increases in acceleration, etc.

The same consistent and invariant structural relationship is posited and often found in Rasch model applications, such that reasonable matches are found between the expected and observed response probabilities are found for various differences between ability, attitude, or performance measures Bn and the difficulty calibrations Di of the items on the scale, between different measures relative to any given item, and between different calibrations relative to any given person. Of course, any number of parameters may be added, as long as they are included in an initial calibration design in which they are linked together in a common frame of reference.

Model fit statistics, principal components analysis of the standardized residuals, statistical studies of differential item/person functioning, and graphical methods are all applied to the study of departures from the modeled expectations.

I’ve shown here how the additive expression of the Pythagorean theorem, the multiplicative expression of natural laws, and the additive and multiplicative forms of Rasch models all participate in the same simultaneous, conjoint relation of two parameters mediated by a third. For those who think geometrically, perhaps the connections drawn here will be helpful in visualizing the design of experiments testing hypotheses of converging yet separable parameters. For those who think algebraically, perhaps the structure of lawful regularity in question and answer processes will be helpful in focusing attention on how to proceed step by step from one definite idea to another, in the manner so well demonstrated by Maxwell (Forfar, 2002, p. 8). Either way, the geometrical and/or algebraic figures and symbols ought to work together to provide a transparent view on the abstract mathematical relationships that stand independent from whatever local particulars are used as the medium of their representation.

Just as Einstein held that it would have been impossible to formulate the theory of relativity without the concepts, relationships, and images of practical geometry, so, too, may it one day turn out that key advances in the social and human sciences depend on the invariance of measures related to one another in the simple and lawful regularities of geometry.

Figure 1. A geometrical proof of the Pythagorean Theorem

References

Burtt, E. A. (1954). The metaphysical foundations of modern physical science (Rev. ed.) [First edition published in 1924]. Garden City, New York: Doubleday Anchor.

Einstein, A. (1922). Geometry and experience (G. B. Jeffery, W. Perrett, Trans.). In Sidelights on relativity (pp. 12-23). London, England: Methuen & Co. LTD.

Forfar, J. (2002, June). James Clerk Maxwell: His qualities of mind and personality as judged by his contemporaries. Mathematics Today, 38(3), 83.

Gadamer, H.-G. (1980). Dialogue and dialectic: Eight hermeneutical studies on Plato (P. C. Smith, Trans.). New Haven: Yale University Press.

Holton, G. (1988). Thematic origins of scientific thought (Revised ed.). Cambridge, Massachusetts: Harvard University Press.

Maxwell, J. C. (1965/1890). The scientific papers of James Clerk Maxwell (W. D. Niven, Ed.). New York: Dover Publications.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D., & Stone, M. H. (1979). Best test design: Rasch measurement. Chicago, Illinois: MESA Press.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

How Evidence-Based Decision Making Suffers in the Absence of Theory and Instrument: The Power of a More Balanced Approach

January 28, 2010

The Basis of Evidence in Theory and Instrument

The ostensible point of basing decisions in evidence is to have reasons for proceeding in one direction versus any other. We want to be able to say why we are proceeding as we are. When we give evidence-based reasons for our decisions, we typically couch them in terms of what worked in past experience. That experience might have been accrued over time in practical applications, or it might have been deliberately arranged in one or more experimental comparisons and tests of concisely stated hypotheses.

At its best, generalizing from past experience to as yet unmet future experiences enables us to navigate life and succeed in ways that would not be possible if we could not learn and had no memories. The application of a lesson learned from particular past events to particular future events involves a very specific inferential process. To be able to recognize repeated iterations of the same things requires the accumulation of patterns of evidence. Experience in observing such patterns allows us to develop confidence in our understanding of what that pattern represents in terms of pleasant or painful consequences. When we are able to conceptualize and articulate an idea of a pattern, and when we are then able to recognize a new occurrence of that pattern, we have an idea of it.

Evidence-based decision making is then a matter of formulating expectations from repeatedly demonstrated and routinely reproducible patterns of observations that lend themselves to conceptual representations, as ideas expressed in words. Linguistic and cultural frameworks selectively focus attention by projecting expectations and filtering observations into meaningful patterns represented by words, numbers, and other symbols. The point of efforts aimed at basing decisions in evidence is to try to go with the flow of this inferential process more deliberately and effectively than might otherwise be the case.

None of this is new or controversial. However, the inferential step from evidence to decision always involves unexamined and unjustified assumptions. That is, there is always an element of metaphysical faith behind the expectation that any given symbol or word is going to work as a representation of something in the same way that it has in the past. We can never completely eliminate this leap of faith, since we cannot predict the future with 100% confidence. We can, however, do a lot to reduce the size of the leap, and the risks that go with it, by questioning our assumptions in experimental research that tests hypotheses as to the invariant stability and predictive utility of the representations we make.

Theoretical and Instrumental Assumptions Hidden Behind the Evidence

For instance, evidence as to the effectiveness of an intervention or treatment is often expressed in terms of measures commonly described as quantitative. But it is unusual for any evidence to be produced justifying that description in terms of something that really adds up in the way numbers do. So we often find ourselves in situations in which our evidence is much less meaningful, reliable, and valid than we suppose it to be.

Quantitative measures are often valued as the hallmark of rational science. But their capacity to live up to this billing depends on the quality of the inferences that can be supported. Very few researchers thoroughly investigate the quality of their measures and justify the inferences they make relative to that quality.

Measurement presumes a reproducible pattern of evidence that can serve as the basis for a decision concerning how much of something has been observed. It naturally follows that we often base measurement in counts of some kind—successes, failures, ratings, frequencies, etc. The counts, scores, or sums are then often transformed into percentages by dividing them into the maximum possible that could be obtained. Sometimes the scores are averaged for each person measured, and/or for each item or question on the test, assessment, or survey. These scores and percentages are then almost universally fed directly into decision processes or statistical analyses with no further consideration.

The reproducible pattern of evidence on which decisions are based is presumed to exist between the measures, not within them. In other words, the focus is on the group or population statistics, not on the individual measures. Attention is typically focused on the tip of the iceberg, the score or percentage, not on the much larger, but hidden, mass of information beneath it. Evidence is presumed to be sufficient to the task when the differences between groups of scores are of a consistent size or magnitude, but is this sufficient?

Going Past Assumptions to Testable Hypotheses

In other words, does not science require that evidence be explained by theory, and embodied in instrumentation that provides a shared medium of observation? As shown in the blue lines in the Figure below,

  • theory, whether or not it is explicitly articulated, inevitably influences both what counts as valid data and the configuration of the medium of its representation, the instrument;
  • data, whether or not it is systematically gathered and evaluated, inevitably influences both the medium of its representation, the instrument, and the implicit or explicit theory that explains its properties and justifies its applications; and
  • instruments, whether or not they are actually calibrated from a mapping of symbols and substantive amounts, inevitably influence data gathering and the image of the object explained by theory.

The rhetoric of evidence-based decision making skips over the roles of theory and instrumentation, drawing a direct line from data to decision. In leaving theory laxly formulated, we allow any story that makes a bit of sense and is communicated by someone with a bit of charm or power to carry the day. In not requiring calibrated instrumentation, we allow any data that cross the threshold into our awareness to serve as an acceptable basis for decisions.

What we want, however, is to require meaningful measures that really provide the evidence needed for instruments that exhibit invariant calibrations and for theories that provide predictive explanatory control over the variable. As shown in the Figure, we want data that push theory away from the instrument, theory that separates the data and instrument, and instruments that get in between the theory and data.

We all know to distrust too close a correspondence between theory and data, but we too rarely understand or capitalize on the role of the instrument in mediating the theory-data relation. Similarly, when the questions used as a medium for making observations are obviously biased to produce responses conforming overly closely with a predetermined result, we see that the theory and the instrument are too close for the data to serve as an effective mediator.

Finally, the situation predominating in the social sciences is one in which both construct and measurement theories are nearly nonexistent, which leaves data completely dependent on the instrument it came from. In other words, because counts of correct answers or sums of ratings are mistakenly treated as measures, instruments fully determine and restrict the range of measurement to that defined by the numbers of items and rating categories. Once the instrument is put in play, changes to it would make new data incommensurable with old, so, to retain at least the appearance of comparability, the data structure then fully determines and restricts the instrument.

What we want, though, is a situation in which construct and measurement theories work together to make the data autonomous of the particular instrument it came from. We want a theory that explains what is measured well enough for us to be able to modify existing instruments, or create entirely new ones, that give the same measures for the same amounts as the old instruments. We want to be able to predict item calibrations from the properties of the items, we want to obtain the same item calibrations across data sets, and we want to be able to predict measures on the basis of the observed responses (data) no matter which items or instrument was used to produce them.

Most importantly, we want a theory and practice of measurement that allows us to take missing data into account by providing us with the structural invariances we need as media for predicting the future from the past. As Ben Wright (1997, p. 34) said, any data analysis method that requires complete data to produce results disqualifies itself automatically as a viable basis for inference because we never have complete data—any practical system of measurement has to be positioned so as to be ready to receive, process, and incorporate all of the data we have yet to gather. This goal is accomplished to varying degrees in Rasch measurement (Rasch, 1960; Burdick, Stone, & Stenner, 2006; Dawson, 2004). Stenner and colleagues (Stenner, Burdick, Sanford, & Burdick, 2006) provide a trajectory of increasing degrees to which predictive theory is employed in contemporary measurement practice.

The explanatory and predictive power of theory is embodied in instruments that focus attention on recording observations of salient phenomena. These observations become data that inform the calibration of instruments, which then are used to gather further data that can be used in practical applications and in checks on the calibrations and the theory.

“Nothing is so practical as a good theory” (Lewin, 1951, p. 169). Good theory makes it possible to create symbolic representations of things that are easy to think with. To facilitate clear thinking, our words, numbers, and instruments must be transparent. We have to be able to look right through them at the thing itself, with no concern as to distortions introduced by the instrument, the sample, the observer, the time, the place, etc. This happens only when the structure of the instrument corresponds with invariant features of the world. And where words effect this transparency to an extent, it is realized most completely when we can measure in ways that repeatedly give the same results for the same amounts in the same conditions no matter which instrument, sample, operator, etc. is involved.

Where Might Full Mathematization Lead?

The attainment of mathematical transparency in measurement is remarkable for the way it focuses attention and constrains the imagination. It is essential to appreciate the context in which this focusing occurs, as popular opinion is at odds with historical research in this regard. Over the last 60 years, historians of science have come to vigorously challenge the widespread assumption that technology is a product of experimentation and/or theory (Kuhn, 1961/1977; Latour, 1987, 2005; Maas, 2001; Mendelsohn, 1992; Rabkin, 1992; Schaffer, 1992; Heilbron, 1993; Hankins & Silverman, 1999; Baird, 2002). Neither theory nor experiment typically advances until a key technology is widely available to end users in applied and/or research contexts. Rabkin (1992) documents multiple roles played by instruments in the professionalization of scientific fields. Thus, “it is not just a clever historical aphorism, but a general truth, that ‘thermodynamics owes much more to the steam engine than ever the steam engine owed to thermodynamics’” (Price, 1986, p. 240).

The prior existence of the relevant technology comes to bear on theory and experiment again in the common, but mistaken, assumption that measures are made and experimentally compared in order to discover scientific laws. History shows that measures are rarely made until the relevant law is effectively embodied in an instrument (Kuhn, 1961/1977, pp. 218-9): “…historically the arrow of causality is largely from the technology to the science” (Price, 1986, p. 240). Instruments do not provide just measures; rather they produce the phenomenon itself in a way that can be controlled, varied, played with, and learned from (Heilbron, 1993, p. 3; Hankins & Silverman, 1999; Rabkin, 1992). The term “technoscience” has emerged as an expression denoting recognition of this priority of the instrument (Baird, 1997; Ihde & Selinger, 2003; Latour, 1987).

Because technology often dictates what, if any, phenomena can be consistently produced, it constrains experimentation and theorizing by focusing attention selectively on reproducible, potentially interpretable effects, even when those effects are not well understood (Ackermann, 1985; Daston & Galison, 1992; Ihde, 1998; Hankins & Silverman, 1999; Maasen & Weingart, 2001). Criteria for theory choice in this context stem from competing explanatory frameworks’ experimental capacities to facilitate instrument improvements, prediction of experimental results, and gains in the efficiency with which a phenomenon is produced.

In this context, the relatively recent introduction of measurement models requiring additive, invariant parameterizations (Rasch, 1960) provokes speculation as to the effect on the human sciences that might be wrought by the widespread availability of consistently reproducible effects expressed in common quantitative languages. Paraphrasing Price’s comment on steam engines and thermodynamics, might it one day be said that as yet unforeseeable advances in reading theory will owe far more to the Lexile analyzer (Stenner, et al., 2006) than ever the Lexile analyzer owed reading theory?

Kuhn (1961/1977) speculated that the second scientific revolution of the early- to mid-nineteenth century followed in large part from the full mathematization of physics, i.e., the emergence of metrology as a professional discipline focused on providing universally accessible, theoretically predictable, and evidence-supported uniform units of measurement (Roche, 1998). Kuhn (1961/1977, p. 220) specifically suggests that a number of vitally important developments converged about 1840 (also see Hacking, 1983, p. 234). This was the year in which the metric system was formally instituted in France after 50 years of development (it had already been obligatory in other nations for 20 years at that point), and metrology emerged as a professional discipline (Alder, 2002, p. 328, 330; Heilbron, 1993, p. 274; Kula, 1986, p. 263). Daston (1992) independently suggests that the concept of objectivity came of age in the period from 1821 to 1856, and gives examples illustrating the way in which the emergence of strong theory, shared metric standards, and experimental data converged in a context of particular social mores to winnow out unsubstantiated and unsupportable ideas and contentions.

Might a similar revolution and new advances in the human sciences follow from the introduction of evidence-based, theoretically predictive, instrumentally mediated, and mathematical uniform measures? We won’t know until we try.

Figure. The Dialectical Interactions and Mutual Mediations of Theory, Data, and Instruments

Figure. The Dialectical Interactions and Mutual Mediations of Theory, Data, and Instruments

Acknowledgment. These ideas have been drawn in part from long consideration of many works in the history and philosophy of science, primarily Ackermann (1985), Ihde (1991), and various works of Martin Heidegger, as well as key works in measurement theory and practice. A few obvious points of departure are listed in the references.

References

Ackermann, J. R. (1985). Data, instruments, and theory: A dialectical approach to understanding science. Princeton, New Jersey: Princeton University Press.

Alder, K. (2002). The measure of all things: The seven-year odyssey and hidden error that transformed the world. New York: The Free Press.

Aldrich, J. (1989). Autonomy. Oxford Economic Papers, 41, 15-34.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Baird, D. (1997, Spring-Summer). Scientific instrument making, epistemology, and the conflict between gift and commodity economics. Techné: Journal of the Society for Philosophy and Technology, 3-4, 25-46. Retrieved 08/28/2009, from http://scholar.lib.vt.edu/ejournals/SPT/v2n3n4/baird.html.

Baird, D. (2002, Winter). Thing knowledge – function and truth. Techné: Journal of the Society for Philosophy and Technology, 6(2). Retrieved 19/08/2003, from http://scholar.lib.vt.edu/ejournals/SPT/v6n2/baird.html.

Burdick, D. S., Stone, M. H., & Stenner, A. J. (2006). The Combined Gas Law and a Rasch Reading Law. Rasch Measurement Transactions, 20(2), 1059-60 [http://www.rasch.org/rmt/rmt202.pdf].

Carroll-Burke, P. (2001). Tools, instruments and engines: Getting a handle on the specificity of engine science. Social Studies of Science, 31(4), 593-625.

Daston, L. (1992). Baconian facts, academic civility, and the prehistory of objectivity. Annals of Scholarship, 8, 337-363. (Rpt. in L. Daston, (Ed.). (1994). Rethinking objectivity (pp. 37-64). Durham, North Carolina: Duke University Press.)

Daston, L., & Galison, P. (1992, Fall). The image of objectivity. Representations, 40, 81-128.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York, New York: Routledge.

Hacking, I. (1983). Representing and intervening: Introductory topics in the philosophy of natural science. Cambridge: Cambridge University Press.

Hankins, T. L., & Silverman, R. J. (1999). Instruments and the imagination. Princeton, New Jersey: Princeton University Press.

Heelan, P. A. (1983, June). Natural science as a hermeneutic of instrumentation. Philosophy of Science, 50, 181-204.

Heelan, P. A. (1998, June). The scope of hermeneutics in natural science. Studies in History and Philosophy of Science Part A, 29(2), 273-98.

Heidegger, M. (1977). Modern science, metaphysics, and mathematics. In D. F. Krell (Ed.), Basic writings [reprinted from M. Heidegger, What is a thing? South Bend, Regnery, 1967, pp. 66-108] (pp. 243-282). New York: Harper & Row.

Heidegger, M. (1977). The question concerning technology. In D. F. Krell (Ed.), Basic writings (pp. 283-317). New York: Harper & Row.

Heilbron, J. L. (1993). Weighing imponderables and other quantitative science around 1800. Historical studies in the physical and biological sciences), 24(Supplement), Part I, pp. 1-337.

Hessenbruch, A. (2000). Calibration and work in the X-ray economy, 1896-1928. Social Studies of Science, 30(3), 397-420.

Ihde, D. (1983). The historical and ontological priority of technology over science. In D. Ihde, Existential technics (pp. 25-46). Albany, New York: State University of New York Press.

Ihde, D. (1991). Instrumental realism: The interface between philosophy of science and philosophy of technology. (The Indiana Series in the Philosophy of Technology). Bloomington, Indiana: Indiana University Press.

Ihde, D. (1998). Expanding hermeneutics: Visualism in science. Northwestern University Studies in Phenomenology and Existential Philosophy). Evanston, Illinois: Northwestern University Press.

Ihde, D., & Selinger, E. (Eds.). (2003). Chasing technoscience: Matrix for materiality. (Indiana Series in Philosophy of Technology). Bloomington, Indiana: Indiana University Press.

Kuhn, T. S. (1961/1977). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. In T. S. Kuhn, The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago: University of Chicago Press, 1977).

Kula, W. (1986). Measures and men (R. Screter, Trans.). Princeton, New Jersey: Princeton University Press (Original work published 1970).

Lapre, M. A., & Van Wassenhove, L. N. (2002, October). Learning across lines: The secret to more efficient factories. Harvard Business Review, 80(10), 107-11.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York, New York: Cambridge University Press.

Latour, B. (2005). Reassembling the social: An introduction to Actor-Network-Theory. (Clarendon Lectures in Management Studies). Oxford, England: Oxford University Press.

Lewin, K. (1951). Field theory in social science: Selected theoretical papers (D. Cartwright, Ed.). New York: Harper & Row.

Maas, H. (2001). An instrument can make a science: Jevons’s balancing acts in economics. In M. S. Morgan & J. Klein (Eds.), The age of economic measurement (pp. 277-302). Durham, North Carolina: Duke University Press.

Maasen, S., & Weingart, P. (2001). Metaphors and the dynamics of knowledge. (Vol. 26. Routledge Studies in Social and Political Thought). London: Routledge.

Mendelsohn, E. (1992). The social locus of scientific instruments. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 5-22). Bellingham, WA: SPIE Optical Engineering Press.

Polanyi, M. (1964/1946). Science, faith and society. Chicago: University of Chicago Press.

Price, D. J. d. S. (1986). Of sealing wax and string. In Little Science, Big Science–and Beyond (pp. 237-253). New York, New York: Columbia University Press.

Rabkin, Y. M. (1992). Rediscovering the instrument: Research, industry, and education. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 57-82). Bellingham, Washington: SPIE Optical Engineering Press.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roche, J. (1998). The mathematics of measurement: A critical history. London: The Athlone Press.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Thurstone, L. L. (1959). The measurement of values. Chicago: University of Chicago Press, Midway Reprint Series.

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

On the alleged difficulty of quantifying this or that

October 5, 2009

That this effect or that phenomenon is “difficult to quantify” is one of those phrases that people use from time to time. But, you know, building a computer is difficult, too. I couldn’t do it, and you probably couldn’t, either. Computers are, however, readily available for purchase and it doesn’t matter if you or I can make our own.

Same thing with measurement. Of course, instrument design and calibration are highly technical endeavors, and despite 80+ years of success, most people seem to think it is impossible to really quantify abstract things like abilities, attitudes, motivations,  trust, outcomes and impacts, or maturational development. But real quantification, the kind that is commonly thought possible only for physical things, has been underway in psychology and the social sciences for a long time. More people need to know this.

As anyone who has read much of this blog knows, I’m not talking about some kind of simplistic survey or assessment process that takes measurement to be a mere assignment of numbers to observations. Instrument calibration takes a lot more thought and effort than is usually invested in it. But it isn’t impossible, not by a long shot.

Just as you would not despair of ever having your own computer just because you cannot make one yourself, those who throw up their hands at the supposed difficulty of quantifying something need to think again. Where there’s a will, there’s a way, and scientifically rigorous methods of determining whether something is measurable are a lot more ready to hand than most people realize.

For more information, see my survey design recommendations on pages 1,072-4 at http://www.rasch.org/rmt/rmt203.pdf and Ben Wright’s 15 steps to measurement at http://www.rasch.org/rmt/rmt141g.htm.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Reliability Revisited: Distinguishing Consistency from Error

August 28, 2009

When something is meaningful to us, and we understand it, then we can successfully restate it in our own words and predictably reproduce approximately the same representation across situations  as was obtained in the original formulation. When data fit a Rasch model, the implications are (1) that different subsets of items (that is, different ways of composing a series of observations summarized in a sufficient statistic) will all converge on the same pattern of person measures, and (2) that different samples of respondents or examinees will all converge on the same pattern of item calibrations. The meaningfulness of propositions based in these patterns will then not depend on which collection of items (instrument) or sample of persons is obtained, and all instruments might be equated relative to a single universal, uniform metric so that the same symbols reliably represent the same amount of the same thing.

Statistics and research methods textbooks in psychology and the social sciences commonly make statements like the following about reliability: “Reliability is consistency in measurement. The reliability of individual scale items increases with the number of points in the item. The reliability of the complete scale increases with the number of items.” (These sentences are found at the top of p. 371 in Experimental Methods in Psychology, by Gustav Levine and Stanley Parkinson (Lawrence Erlbaum Associates, 1994).) The unproven, perhaps unintended, and likely unfounded implication of these statements is that consistency increases as items are added.

Despite the popularity of doing so, Green, Lissitz, and Mulaik (1977) argue that reliability coefficients are misused when they are interpreted as indicating the extent to which data are internally consistent. “Green et al. (1977) observed that though high ‘internal consistency’ as indexed by a high alpha results when a general factor runs through the items, this does not rule out obtaining high alpha when there is no general factor running through the test items…. They concluded that the chief defect of alpha as an index of dimensionality is its tendency to increase as the number of items increase” (Hattie, 1985, p. 144).

In addressing the internal consistency of data, the implicit but incompletely realized purpose of estimating scale reliability is to evaluate the extent to which sum scores function as sufficient statistics. How limited is reliability as a tool for this purpose? To answer this question, five dichotomous data sets of 23 items and 22 persons were simulated. The first one was constructed so as to be highly likely to fit a Rasch model, with a deliberately orchestrated probabilistic Guttman pattern. The second one was made nearly completely random. The third, fourth, and fifth data sets were modifications of the first one in which increasing numbers of increasingly inconsistent responses were introduced. (The inconsistencies were not introduced in any systematic way apart from inserting contrary responses in the ordered matrix.) The data sets are shown in the Appendix. Tables 1 and 2 summarize the results.

Table 1 shows that the reliability coefficients do in fact decrease, along with the global model fit log-likelihood chi-squares, as the amount of randomness and inconsistency is increased. Contrary to what is implied in Levine and Parkinson’s statements, however, reliability can vary within a given number of items, as it might across different data sets produced from the same test, survey, or assessment, depending on how much structural invariance is present within them.

Two other points about the tables are worthy of note. First, the Rasch-based person separation reliability coefficients drop at a faster rate than Cronbach’s alpha does. This is probably an effect of the individualized error estimates in the Rasch context, which makes its reliability coefficients more conservative than correlation-based, group-level error estimates. (It is worth noting, as well, that the Winsteps and SPSS estimates of Cronbach’s alpha match. They are reported to one fewer decimal places by Winsteps, but the third decimal place is shown for the SPSS values for contrast.)

Second, the fit statistics are most affected by the initial and most glaring introduction of inconsistencies, in data set three. As the randomness in the data increases, the reliabilities continue to drop, but the fit statistics improve, culminating in the case of data set two, where complete randomness results in near-perfect model fit. This is, of course, the situation in which both the instrument and the sample are as well targeted as they can be, since all respondents have about the same measure and all the items about the same calibration; see Wood (1978) for a commentary on this situation, where coin tosses fit a Rasch model.

Table 2 shows the results of the Winsteps Principal Components Analysis of the standardized residuals for all five data sets. Again, the results conform with and support the pattern shown in the reliability coefficients. It is, however, interesting to note that, for data sets 4 and 5, with their Cronbach’s alphas of about .89 and .80, respectively, which are typically deemed quite good, the PCA shows more variance left unexplained than is explained by the Rasch dimension. The PCA is suggesting that two or more constructs might be represented in the data, but this would never be known from Cronbach’s alpha alone.

Alpha alone would indicate the presence of a unidimensional construct for data sets 3, 4 and 5, despite large standard deviations in the fit statistics and even though more than half the variance cannot be explained by the primary dimension. Worse, for the fifth data set, more variance is captured in the first three contrasts than is explained by the Rasch dimension. But with Cronbach’s alpha at .80, most researchers would consider this scale quite satisfactorily unidimensional and internally consistent.

These results suggest that, first, in seeking high reliability, what is sought more fundamentally is fit to a Rasch model (Andrich & Douglas, 1977; Andrich, 1982; Wright, 1977). That is, in addressing the internal consistency of data, the popular conception of reliability is taking on the concerns of construct validity. A conceptually clearer sense of reliability focuses on the extent to which an instrument works as expected every time it is used, in the sense of the way a car can be reliable. For instance, with an alpha of .70, a screening tool would be able to reliably distinguish measures into two statistically distinct groups (Fisher, 1992; Wright, 1996), problematic and typical. Within the limits of this purpose, the tool would meet the need for the repeated production of information capable of meeting the needs of the situation. Applications in research, accountability, licensure/certification, or diagnosis, however, might demand alphas of .95 and the kind of precision that allows for statistically distinct divisions into six or more groups. In these kinds of applications, where experimental designs or practical demands require more statistical power, measurement precision articulates finer degrees of differences. Finely calibrated instruments provide sensitivity over the entire length of the measurement continuum, which is needed for repeated reproductions of the small amounts of change that might accrue from hard to detect treatment effects.

Separating the construct, internal consistency, and unidimensionality issues  from the repeatability and reproducibility of a given degree of measurement precision provides a much-needed conceptual and methodological clarification of reliability. This clarification is routinely made in Rasch measurement applications (Andrich, 1982; Andrich & Douglas, 1977; Fisher, 1992; Linacre, 1993, 1996, 1997). It is reasonable to want to account for inconsistencies in the data in the error estimates and in the reliability coefficients, and so errors and reliabilities are routinely reported in terms of both the modeled expectations and in a fit-inflated form (Wright, 1995). The fundamental value of proceeding from a basis in individual error and fit statistics (Wright, 1996), is that local imprecisions and failures of invariance can be isolated for further study and selective attention.

The results of the simulated data analyses suggest, second, that, used in isolation, reliability coefficients can be misleading. As Green, et al. say, reliability estimates tend to systematically increase as the number of items increases (Fisher, 2008). The simulated data show that reliability coefficients also systematically decrease as inconsistency increases.

The primary problem with relying on reliability coefficients alone as indications of data consistency hinges on their inability to reveal the location of departures from modeled expectations. Most uses of reliability coefficients take place in contexts in which the model remains unstated and expectations are not formulated or compared with observations. The best that can be done in the absence of a model statement and test of data fit to it is to compare the reliability obtained against that expected on the basis of the number of items and response categories, relative to the observed standard deviation in the scores, expressed in logits (Linacre, 1993). One might then raise questions as to targeting, data consistency, etc. in order to explain larger than expected differences.

A more methodical way, however, would be to employ multiple avenues of approach to the evaluation of the data, including the use of model fit statistics and Principal Components Analysis in the evaluation of differential item and person functioning. Being able to see which individual observations depart the furthest from modeled expectation can provide illuminating qualitative information on the meaningfulness of the data, the measures, and the calibrations, or the lack thereof.  This information is crucial to correcting data entry errors, identifying sources of differential item or person functioning, separating constructs and populations, and improving the instrument. The power of the reliability-coefficient-only approach to data quality evaluation is multiplied many times over when the researcher sets up a nested series of iterative dialectics in which repeated data analyses explore various hypotheses as to what the construct is, and in which these analyses feed into revisions to the instrument, its administration, and/or the population sampled.

For instance, following the point made by Smith (1996), it may be expected that the PCA results will illuminate the presence of multiple constructs in the data with greater clarity than the fit statistics, when there are nearly equal numbers of items representing each different measured dimension. But the PCA does not work as well as the fit statistics when there are only a few items and/or people exhibiting inconsistencies.

This work should result in a full circle return to the drawing board (Wright, 1994; Wright & Stone, 2003), such that a theory of the measured construct ultimately provides rigorously precise predictive control over item calibrations, in the manner of the Lexile Framework (Stenner, et al., 2006) or developmental theories of hierarchical complexity (Dawson, 2004). Given that the five data sets employed here were simulations with no associated item content, the invariant stability and meaningfulness of the construct cannot be illustrated or annotated. But such illustration also is implicit in the quest for reliable instrumentation: the evidentiary basis for a delineation of meaningful expressions of amounts of the thing measured. The hope to be gleaned from the successes in theoretical prediction achieved to date is that we might arrive at practical applications of psychosocial measures that are as meaningful, useful, and economically productive as the theoretical applications of electromagnetism, thermodynamics, etc. that we take for granted in the technologies of everyday life.

Table 1

Reliability and Consistency Statistics

22 Persons, 23 Items, 506 Data Points

Data set Intended reliability Winsteps Real/Model Person Separation Reliability Winsteps/SPSS Cronbach’s alpha Winsteps Person Infit/Outfit Average Mn Sq Winsteps Person Infit/Outfit SD Winsteps Real/Model Item Separation Reliability Winsteps Item Infit/Outfit Average Mn Sq Winsteps Item Infit/Outfit SD Log-Likelihood Chi-Sq/d.f./p
First Best .96/.97 .96/.957 1.04/.35 .49/.25 .95/.96 1.08/0.35 .36/.19 185/462/1.00
Second Worst .00/.00 .00/-1.668 1.00/1.00 .05/.06 .00/.00 1.00/1.00 .05/.06 679/462/.0000
Third Good .90/.91 .93/.927 .92/2.21 .30/2.83 .85/.88 .90/2.13 .64/3.43 337/462/.9996
Fourth Fair .86/.87 .89/.891 .96/1.91 .25/2.18 .79/.83 .94/1.68 .53/2.27 444/462/.7226
Fifth Poor .76/.77 .80/.797 .98/1.15 .24/.67 .59/.65 .99/1.15 .41/.84 550/462/.0029
Table 2

Principal Components Analysis

Data set Intended reliability % Raw Variance Explained by Measures/Persons/Items % Raw Variance Captured in First Three Contrasts Total number of loadings > |.40| in first contrast
First Best 76/41/35 12 8
Second Worst 4.3/1.7/2.6 56 15
Third Good 59/34/25 20 14
Fourth Fair 47/27/20 26 13
Fifth Poor 29/17/11 41 15

References

Andrich, D. (1982, June). An index of person separation in Latent Trait Theory, the traditional KR-20 index, and the Guttman scale response pattern. Education Research and Perspectives, 9(1), http://www.rasch.org/erp7.htm.

Andrich, D. & G. A. Douglas. (1977). Reliability: Distinctions between item consistency and subject separation with the simple logistic model. Paper presented at the Annual Meeting of the American Educational Research Association, New York.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238  [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (2008, Summer). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-3.

Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977, Winter). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37(4), 827-833.

Hattie, J. (1985, June). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139-64.

Levine, G., & Parkinson, S. (1994). Experimental methods in psychology. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Linacre, J. M. (1993). Rasch-based generalizability theory. Rasch Measurement Transactions, 7(1), 283-284; [http://www.rasch.org/rmt/rmt71h.htm].

Linacre, J. M. (1996). True-score reliability or Rasch statistical validity? Rasch Measurement Transactions, 9(4), 455 [http://www.rasch.org/rmt/rmt94a.htm].

Linacre, J. M. (1997). KR-20 or Rasch reliability: Which tells the “Truth?”. Rasch Measurement Transactions, 11(3), 580-1 [http://www.rasch.org/rmt/rmt113l.htm].

Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3(1), 25-40.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Wood, R. (1978). Fitting the Rasch model: A heady tale. British Journal of Mathematical and Statistical Psychology, 31, 27-32.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D. (1980). Foreword, Afterword. In Probabilistic models for some intelligence and attainment tests, by Georg Rasch (pp. ix-xix, 185-199. http://www.rasch.org/memo63.htm) [Reprint; original work published in 1960 by the Danish Institute for Educational Research]. Chicago, Illinois: University of Chicago Press.

Wright, B. D. (1994, Summer). Theory construction from empirical observations. Rasch Measurement Transactions, 8(2), 362 [http://www.rasch.org/rmt/rmt82h.htm].

Wright, B. D. (1995, Summer). Which standard error? Rasch Measurement Transactions, 9(2), 436-437 [http://www.rasch.org/rmt/rmt92n.htm].

Wright, B. D. (1996, Winter). Reliability and separation. Rasch Measurement Transactions, 9(4), 472 [http://www.rasch.org/rmt/rmt94n.htm].

Wright, B. D., & Stone, M. H. (2003). Five steps to science: Observing, scoring, measuring, analyzing, and applying. Rasch Measurement Transactions, 17(1), 912-913 [http://www.rasch.org/rmt/rmt171j.htm].

Appendix

Data Set 1

01100000000000000000000

10100000000000000000000

11000000000000000000000

11100000000000000000000

11101000000000000000000

11011000000000000000000

11100100000000000000000

11110100000000000000000

11111010100000000000000

11111101000000000000000

11111111010101000000000

11111111101010100000000

11111111111010101000000

11111111101101010010000

11111111111010101100000

11111111111111010101000

11111111111111101010100

11111111111111110101011

11111111111111111010110

11111111111111111111001

11111111111111111111101

11111111111111111111100

Data Set 2

01101010101010101001001

10100101010101010010010

11010010101010100100101

10101001010101001001000

01101010101010110010011

11011010010101100100101

01100101001001001001010

10110101000110010010100

01011010100100100101001

11101101001001001010010

11011010010101010100100

10110101101010101001001

01101011010000101010010

11010110101001010010100

10101101010000101101010

11011010101010010101010

10110101010101001010101

11101010101010110101011

11010101010101011010110

10101010101010110111001

01010101010101101111101

10101010101011011111100

Data Set 3

01100000000000100000010

10100000000000000010001

11000000000000100000010

11100000000000100000000

11101000000000100010000

11011000000000000000000

11100100000000100000000

11110100000000000000000

11111010100000100000000

11111101000000000000000

11111111010101000000000

11111111101010100000000

11111111111010001000000

11011111111111010010000

11011111111111101100000

11111111111111010101000

11011111111111101010100

11111111111111010101011

11011111111111111010110

11111111111111111111001

11011111111111111111101

10111111111111111111110

Data Set 4

01100000000000100010010

10100000000000000010001

11000000000000100000010

11100000000000100000001

11101000000000100010000

11011000000000000010000

11100100000000100010000

11110100000000000000000

11111010100000100010000

11111101000000000000000

11111011010101000010000

11011110111010100000000

11111111011010001000000

11011111101011110010000

11011111101101101100000

11111111110101010101000

11011111111011101010100

11111111111101110101011

01011111111111011010110

10111111111111111111001

11011111111111011111101

10111111111111011111110

Data Set 5

11100000010000100010011

10100000000000000011001

11000000010000100001010

11100000010000100000011

11101000000000100010010

11011000000000000010011

11100100000000100010000

11110100000000000000011

11111010100000100010000

00000000000011111111111

11111011010101000010000

11011110111010100000000

11111111011010001000000

11011111101011110010000

11011111101101101100000

11111111110101010101000

11011111101011101010100

11111111111101110101011

01011111111111011010110

10111111101111111111001

11011111101111011111101

00111111101111011111110

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.