Archive for April, 2010

How bad will the financial crises have to get before…?

April 30, 2010

More and more states and nations around the world face the possibility of defaulting on their financial obligations. The financial crises are of epic historical proportions. This is a disaster of the first order. And yet, it is so odd–we have the solutions and preventative measures we need at our finger tips, but no one knows about them or is looking for them.

So,  I am persuaded to once again wonder if there might now be some real interest in the possibilities of capitalizing on

  • measurement’s well-known capacity for reducing transaction costs by improving information quality and reducing information volume;
  • instruments calibrated to measure in constant units (not ordinal ones) within known error ranges (not as though the measures are perfectly precise) with known data quality;
  • measures made meaningful by their association with invariant scales defined in terms of the questions asked;
  • adaptive instrument administration methods that make all measures equally precise by targeting the questions asked;
  • judge calibration methods that remove the person rating performances as a factor influencing the measures;
  • the metaphor of transparency by calibrating instruments that we really look right through at the thing measured (risk, governance, abilities, health, performance, etc.);
  • efficient markets for human, social, and natural capital by means of the common currencies of uniform metrics, calibrated instrumentation, and metrological networks;
  • the means available for tuning the instruments of the human, social, and environmental sciences to well-tempered scales that enable us to more easily harmonize, orchestrate, arrange, and choreograph relationships;
  • our understandings that universal human rights require universal uniform measures, that fair dealing requires fair measures, and that our measures define who we are and what we value; and, last but very far from least,
  • the power of love–the back and forth of probing questions and honest answers in caring social intercourse plants seminal ideas in fertile minds that can be nurtured to maturity and Socratically midwifed as living meaning born into supportive ecologies of caring relations.

How bad do things have to get before we systematically and collectively implement the long-established and proven methods we have at our disposal? It is the most surreal kind of schizophrenia or passive-aggressive avoidance pathology to keep on tormenting ourselves with problems for which we have solutions.

For more information on these issues, see prior blogs posted here, the extensive documentation provided, and http://www.livingcapitalmetrics.com.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

Guttman on sufficiency, statistics, and cumulative science

April 29, 2010

“R. A. Fisher employed maximum likelihood…as a way of finding sufficient statistics if they exist. Now, sufficient statistics rarely exist, and even when they do, their use need not be optimal for estimation problems. As enlarged on in Reference 7 [an unpublished 1984 paper of Guttman’s], for each best unbiased sufficient statistic there generally is a better–and not necessarily sufficient–biased one. To use maximum likelihood requires knowledge of the complete sampling distribution, but biased estimation is proved to be better in a distribution-free fashion.” (Guttman, 1985, pp. 7-8; 1994, pp. 345-6)

In this passage, Guttman may be addressing issues related to the kind of biases that can affect extreme scores in Joint Maximum Likelihood Estimation (JMLE, formerly UCON) (Jansen, van den Wollenberg, Wierda, 1988; Wright, 1988; Wright & Panchapakesan, 1969). But what’s more interesting is the combination of the awareness of sufficiency and estimation issues revealed in this remark with the context in which it is made.

Guttman targets and rightly skewers a good number of inconsistencies and internal contradictions in statistical inference. But he shares in many of them himself. That is, Guttman’s valuable insights as to measurement are limited by his failure to consider at all the importance of the instrument in science, and by his limited appreciation of the value of theory. This is so despite his realization that “There can be no solution [to the problem of sampling items from one or more indefinitely large universes of content] without a structural theory” (1994, p. 329, in his Psychometrika review of Gulliksen’s Theory of Mental Tests), which is fully in tune with his emphasis on the central role of substantive replication in science (1994, p. 343).

But in the way he articulates his concern with replication, we see that, for Guttman, as for so many others, measurement is a matter of data analysis and not one of calibrating instruments. Measurement is not primarily a statistical process performed on computers, but is an individual event performed with an instrument. Calibrated instruments remove the necessity for data analysis (though other kinds of analysis may, of course, be continued or commenced).

In reading Guttman, it is difficult to follow through on his pithy and rich observations on the inconsistencies and illogic of statistical inference because he does not offer a clear alternative path, a measurement path structured by instrumentation. In his review of Lord and Novick (1968), for instance, Guttman remarks on the authors’ failure to provide their promised synthetic theory of tests and measurement, but does not offer or point toward one himself, even after noting the inclusion of Rasch’s Poisson models in the Lord and Novick classification system. Though much has been done to connect Guttman with Rasch (Andrich, 1982, 1985; Douglas & Wright, 1989; Engelhard, 2008; Linacre, 1991, 2000; Linacre & Wright, 1996; Tenenbaum, 1999; Wilson, 1989), and to advance in the direction of point-of-use measurement (Bode, 1999; Bode, Heinemann, & Semik, 2000; Connolly, Nachtman, & Pritchett, 1971; Davis, Perruccio, Canizares, Tennant, Hawker, et al., 2008; Linacre, 1997; many others), much more remains to be done.

Andrich, D. (1982, June). An index of person separation in Latent Trait Theory, the traditional KR-20 index, and the Guttman scale response pattern. Education Research and Perspectives, 9(1), 95-104 [http://www.rasch.org/erp7.htm].

Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. B. Tuma (Ed.), Sociological methodology 1985 (pp. 33-80). San Francisco, California: Jossey-Bass.

Bode, R. K. (1999). Self-scoring key for Galveston Orientation and Amnesia Test. Rasch Measurement Transactions, 13(1), 680 [http://www.rasch.org/rmt/rmt131c.htm].

Bode, R. K., Heinemann, A. W., & Semik, P. (2000, Feb). Measurement properties of the Galveston Orientation and Amnesia Test (GOAT) and improvement patterns during inpatient rehabilitation. Journal of Head Trauma Rehabilitation, 15(1), 637-55.

Connolly, A. J., Nachtman, W., & Pritchett, E. M. (1971). Keymath: Diagnostic Arithmetic Test. Circle Pines, Minnesota: American Guidance Service.

Davis, A. M., Perruccio, A. V., Canizares, M., Tennant, A., Hawker, G. A., Conaghan, P. G., et al. (2008, May). The development of a short measure of physical function for hip OA HOOS-Physical Function Shortform (HOOS-PS): An OARSI/OMERACT initiative. Osteoarthritis Cartilage, 16(5), 551-9.

Douglas, G. A., & Wright, B. D. (1989). Response patterns and their probabilities. Rasch Measurement Transactions, 3(4), 75-77 [http://www.rasch.org/rmt/rmt34.htm].

Engelhard, G. (2008, July). Historical perspectives on invariant measurement: Guttman, Rasch, and Mokken. Measurement: Interdisciplinary Research & Perspectives, 6(3), 155-189.

Guttman, L. (1985). The illogic of statistical inference for cumulative science. Applied Stochastic Models and Data Analysis, 1, 3-10. (Reprinted in Guttman 1994, pp. 341-348)

Guttman, L. (1994). Louis Guttman on theory and methodology: Selected writings (S. Levy, Ed.). Dartmouth Benchmark Series. Brookfield, VT: Dartmouth Publishing Company.

Jansen, P., Van den Wollenberg, A., & Wierda, F. (1988). Correcting unconditional parameter estimates in the Rasch model for inconsistency. Applied Psychological Measurement, 12(3), 297-306.

Linacre, J. M. (1991, Spring). Stochastic Guttman order. Rasch Measurement Transactions, 5(4), 189 [http://www.rasch.org/rmt/rmt54p.htm].

Linacre, J. M. (1997). Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation State of the Art Reviews, 11(2), 315-324 [http://www.rasch.org/memo60.htm].

Linacre, J. M. (2000, Autumn). Guttman coefficients and Rasch data. Rasch Measurement Transactions, 14(2), 746-7 [http://www.rasch.org/rmt/rmt142e.htm].

Linacre, J. M., & Wright, B. D. (1996, Autumn). Guttman-style item location maps. Rasch Measurement Transactions, 10(2), 492-3 [http://www.rasch.org/rmt/rmt102h.htm].

Lord, F. M., & Novick, M. R. (Eds.). (1968). Statistical theories of mental test scores. Reading, Massachusetts: Addison-Wesley.

Tenenbaum, G. (1999, Jan-Mar). The implementation of Thurstone’s and Guttman’s measurement ideas in Rasch analysis. International Journal of Sport Psychology, 30(1), 3-16.

Wilson, M. (1989). A comparison of deterministic and probabilistic approaches to learning structures. Australian Journal of Education, 33(2), 127-140.

Wright, B. D. (1988, Sep). The efficacy of unconditional maximum likelihood bias correction: Comment on Jansen, Van den Wollenberg, and Wierda. Applied Psychological Measurement, 12(3), 315-318.

Wright, B. D., & Panchapakesan, N. (1969). A procedure for sample-free item analysis. Educational and Psychological Measurement, 29(1), 23-48.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Geometrical and algebraic expressions of scientific laws

April 12, 2010

Geometry provides a model of scientific understanding that has repeatedly proven itself over the course of history. Einstein (1922) considered geometry to be “the most ancient branch of physics” (p. 14). He accorded “special importance” to his view that “all linear measurement in physics is practical geometry,” “because without it I should have been unable to formulate the theory of relativity” (p. 14).

Burtt (1954) concurs, pointing out that the essential question for Copernicus was not “Does the earth move?” but, rather, “…what motions should we attribute to the earth in order to obtain the simplest and most harmonious geometry of the heavens that will accord with the facts?” (p. 39). Maxwell similarly employed a geometrical analogy in working out his electromagnetic theory, saying

“By referring everything to the purely geometrical idea of the motion of an imaginary fluid, I hope to attain generality and precision, and to avoid the dangers arising from a premature theory professing to explain the cause of the phenomena. If the results of mere speculation which I have collected are found to be of any use to experimental philosophers, in arranging and interpreting their results, they will have served their purpose, and a mature theory, in which physical facts will be physically explained, will be formed by those who by interrogating Nature herself can obtain the only true solution of the questions which the mathematical theory suggests.” (Maxwell, 1965/1890, p. 159).

Maxwell was known for thinking visually, once as a student offering a concise geometrical solution to a problem that resisted a lecturer’s lengthy algebraic efforts (Forfar, 2002, p. 8). His approach seemed to be one of playing with images with the aim of arriving at simple mathematical representations, instead of thinking linearly through a train of analysis. A similar method is said to have been used by Einstein (Holton, 1988, pp. 385-388).

Gadamer (1980) speaks of the mathematical transparency of geometric figures to convey Plato’s reasons for requiring mathematical training of the students in his Academy, saying:

“Geometry requires figures which we draw, but its object is the circle itself…. Even he who has not yet seen all the metaphysical implications of the concept of pure thinking but only grasps something of mathematics—and as we know, Plato assumed that such was the case with his listeners—even he knows that in a manner of speaking one looks right through the drawn circle and keeps the pure thought of the circle in mind.” (p. 101)

But exactly how do geometrical visualizations lend themselves to algebraic formulae? More specifically, is it possible to see the algebraic structure of scientific laws in geometry?

Yes, it is. Here’s how. Starting from the Pythagorean theorem, we know that the square of a right triangle’s hypotenuse is equal to the sum of the squares of the other two sides. For convenience, imagine that the lengths of the sides of the triangle, as shown in Figure 1, are 3, 4, and 5, for sides a, b, and c, respectively. We can count the unit squares within each side’s square and see that the 25 in the square of the hypotenuse equal the sum of the 9 in the square of side a and the 16 in the sum of side b.

That mathematical relationship can, of course, be written as

a2 + b2 = c2

which, for Figure 1, is

32 + 42 = 52 = 9 + 16 = 25

Now, most scientific laws are written in a multiplicative form, like this:

m = f / a

or

f = m * a

which, of course, is how Maxwell presented Newton’s Second Law. So how would the Pythagorean Theorem be written like a physical law?

Since the advent of small, cheap electronic calculators, slide rules have fallen out of fashion. But these eminently useful tools are built to take advantage of the way the natural logarithm and the number e (2.71828…) make division interchangeable with subtraction, and multiplication interchangeable with addition.

That means the Pythagorean Theorem could be written like Newton’s Second Law of Motion, or the Combined Gas Law. Here’s how it works. The Pythagorean Theorem is normally written as

a2 + b2 = c2

but does it make sense to write it as follows?

a2 * b2 = c2

Using the convenient values for a, b, and c from above

32 + 42 = 52

and

9 + 16 = 25

so, plainly, simply changing the plus sign to a multiplication sign will not work, since 9 * 16 is 144. This is where the number e comes in. What happens if e is taken as a base raised to the power of each of the parameters in the equation? Does this equation work?

e9 * e16 = e25

which, substituting a for e9, b for e16, and c for e25, could be represented by

a * b = c

and which could be solved as

8103 * 8,886,015 ≈ 72,003,378,611

Yes, it works, and so it is possible to divide through by e16 and arrive at the form of the law used by Maxwell and Rasch:

8103 ≈ 72,003,378,611 / 8,886,015

or

e9 = e25 / e16

or, again substituting a for e9, b for e16, and c for e25, could be represented by

a = c / b

which, when converted back to the additive form, looks like this:

a = c – b

and this

9 = 25 – 16 .

Rasch wrote his model in the multiplicative form of

εvi = θvσi

and it is often written in the form of

Pr {Xni = 1} = eβnδi / 1 + eβnδi

or

Pni = exp(Bn – Di) / [1 + exp(Bn – Di)]

which is to say that the probability of a correct response from person n on item i is equal to e taken to the power of the difference between the estimate β (or B) of person n‘s ability and the estimate δ (or D) of item i‘s difficulty, divided by one plus e to that same power.

Logit estimates of Rasch model parameters taken straight from software output usually range between ­-3.0 or so and 3.0. So what happens if a couple of arbitrary values are plugged into these equations? If someone has a measure of 2 logits, what is their probability of a correct answer on an item that calibrates at 0.5 logits? The answer should be

e2-0.5 / (1 + e2-0.5).

Now,

e1.5 = 2.718281.5 = 4.481685….

and

4.481685 / (1 + 4.481685) ≈ 0.8176

For a table of the relationships between logit differences, odds, and probabilities, see Table 1.4.1 in Wright & Stone (1979, p. 16), or Table 1 in Wright (1977).

This form of the model

Pni = exp(Bn – Di) / [1 + exp(Bn – Di)]

can be rewritten in an equivalent form as

[Pni / (1 – Pni)] = exp(Bn – Di) .

Taking the natural logarithm of the response probabilities expresses the model in perhaps its most intuitive form, often written as

ln[Pni / (1 – Pni)] = Bn – Di .

Substituting a for ln[Pni / (1 – Pni)], b for Bn, and c for Di, we have the same equation as we had for the Pythagorean Theorem, above

a = c – b .

Plugging in the same values of 2.0 and 0.5 logits for Bn and Di,

ln[Pni / (1 – Pni)] = 2.0 – 0.5 = 1.5.

The logit value of 1.5 is obtained from response odds [Pni / (1 – Pni)] of about 4.5, making, again, Pni equal to about 0.82.

Rasch wrote the model in working from Maxwell like this:

Avj = Fj / Mv .

So when catapult j’s force F of 50 Newtons (361.65 poundals) is applied to object v’s mass M of 10 kilograms (22.046 pounds), the acceleration of this interaction is 5 meters (16.404 feet) per second, per second. Increases in force relative to the same mass result in proportionate increases in acceleration, etc.

The same consistent and invariant structural relationship is posited and often found in Rasch model applications, such that reasonable matches are found between the expected and observed response probabilities are found for various differences between ability, attitude, or performance measures Bn and the difficulty calibrations Di of the items on the scale, between different measures relative to any given item, and between different calibrations relative to any given person. Of course, any number of parameters may be added, as long as they are included in an initial calibration design in which they are linked together in a common frame of reference.

Model fit statistics, principal components analysis of the standardized residuals, statistical studies of differential item/person functioning, and graphical methods are all applied to the study of departures from the modeled expectations.

I’ve shown here how the additive expression of the Pythagorean theorem, the multiplicative expression of natural laws, and the additive and multiplicative forms of Rasch models all participate in the same simultaneous, conjoint relation of two parameters mediated by a third. For those who think geometrically, perhaps the connections drawn here will be helpful in visualizing the design of experiments testing hypotheses of converging yet separable parameters. For those who think algebraically, perhaps the structure of lawful regularity in question and answer processes will be helpful in focusing attention on how to proceed step by step from one definite idea to another, in the manner so well demonstrated by Maxwell (Forfar, 2002, p. 8). Either way, the geometrical and/or algebraic figures and symbols ought to work together to provide a transparent view on the abstract mathematical relationships that stand independent from whatever local particulars are used as the medium of their representation.

Just as Einstein held that it would have been impossible to formulate the theory of relativity without the concepts, relationships, and images of practical geometry, so, too, may it one day turn out that key advances in the social and human sciences depend on the invariance of measures related to one another in the simple and lawful regularities of geometry.

Figure 1. A geometrical proof of the Pythagorean Theorem

References

Burtt, E. A. (1954). The metaphysical foundations of modern physical science (Rev. ed.) [First edition published in 1924]. Garden City, New York: Doubleday Anchor.

Einstein, A. (1922). Geometry and experience (G. B. Jeffery, W. Perrett, Trans.). In Sidelights on relativity (pp. 12-23). London, England: Methuen & Co. LTD.

Forfar, J. (2002, June). James Clerk Maxwell: His qualities of mind and personality as judged by his contemporaries. Mathematics Today, 38(3), 83.

Gadamer, H.-G. (1980). Dialogue and dialectic: Eight hermeneutical studies on Plato (P. C. Smith, Trans.). New Haven: Yale University Press.

Holton, G. (1988). Thematic origins of scientific thought (Revised ed.). Cambridge, Massachusetts: Harvard University Press.

Maxwell, J. C. (1965/1890). The scientific papers of James Clerk Maxwell (W. D. Niven, Ed.). New York: Dover Publications.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D., & Stone, M. H. (1979). Best test design: Rasch measurement. Chicago, Illinois: MESA Press.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Reasoning by analogy in social science education: On the need for a new curriculum

April 12, 2010

I’d like to revisit the distinction between measurement models and statistical models. Rasch was well known for joking about burning all books containing the words “normal distribution” (Andersen, 1995, p. 385). Rasch’s book and 1961 article both start on their first pages with a distinction between statistical models describing intervariable relations at the group level and measurement models prescribing intravariable relations at the individual level. I think confusion between these kinds of models has caused huge problems.

We typically assume all statistical analyses are quantitative. We refer to any research that uses numbers as quantitative even when nothing is done to map a substantive and invariant unit on a number line. We distinguish between qualitative and quantitative data and methods as though quantification has ever been achieved in the history of science without substantive qualitative understandings of the constructs.

Quantification in fact predates the emergence of statistics by millennia. It seems to me that there is a great deal to be gained from maintaining a careful distinction between statistics and measurement. Measurement is not primarily performed by someone sitting at a computer analyzing data. Measurement is done by individuals using calibrated instruments to obtain immediately useful quantitative information expressed in a universally uniform unit.

Rasch was correct in his assertion that we can measure the reading ability of a child with the same kind of objectivity with which we measure his or her weight or height. But we don’t commonly express individual height and weight measures in statistical terms. 

Information overload is one of the big topics of the day. Which will contribute more to reducing that overload in efficient and meaningful ways: calibrated instruments measuring in common units giving individual users immediate feedback that summarizes responses to dozens of questions, or ordinal group-level item-by-item statistics reported six months too late to do anything about them?

Instrument calibration certainly makes use of statistics, and statistical models usually assume measurement has taken place, but much stands to be gained from a clear distinction between inter- and intra-variable models. And so I respectfully disagree with those who assert that “the Rasch model is first of all a statistical model.” Maxwell’s method of making analogies from well known physical laws (Nersessian, 2002; Turner, 1955) was adopted by Rasch (1960, pp. 110-115) so that his model would have the same structure as the laws of physics.

Statistical models are a different class of models from the laws of physics (Meehl, 1967), since they allow cross-variable interactions in ways that compromise and defeat the possibility of testing the hypotheses of constant unit size, parameter separation, sufficiency, etc.

I’d like to suggest a paraphrase of the first sentence of the abstract from a recent paper (Silva, 2007) on using analogies in science education: Despite its great importance, many students and even their teachers still cannot recognize the relevance of measurement models to build up psychosocial knowledge and are unable to develop qualitative explanations for mathematical expressions of the lawful structural invariances that exist within the social sciences.

And so, here’s a challenge: we need to make an analogy from Silva’s (2007) work in physics science education and develop a curriculum for social science education that follows a parallel track. We could trace the development of reading measurement from Rasch (1960) through the Anchor Test Study (Jaeger, 1973; Rentz & Bashaw, 1977) to the introduction of the Lexile Framework for Reading (Stenner, 2001) and its explicit continuity with Rasch’s use of Maxwell’s method of analogy (Burdick, Stone, & Stenner, 2006) and full blown predictive theory (Stenner & Stone, 2003).

With the example of the Rasch Reading Law in hand, we could then train students and teachers to think about structural invariance in the context of psychosocial constructs. It may be that, without the development and dissemination of at least a college-level curriculum of this kind, we will never overcome the confusion between statistical and measurement models.

References

Andersen, E. B. (1995). What George Rasch would have thought about this book. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 383-390). New York: Springer-Verlag.

Burdick, D. S., Stone, M. H., & Stenner, A. J. (2006). The Combined Gas Law and a Rasch Reading Law. Rasch Measurement Transactions, 20(2), 1059-60 [http://www.rasch.org/rmt/rmt202.pdf].

Jaeger, R. M. (1973). The national test equating study in reading (The Anchor Test Study). Measurement in Education, 4, 1-8.

Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103-115.

Nersessian, N. J. (2002). Maxwell and “the Method of Physical Analogy”: Model-based reasoning, generic abstraction, and conceptual change. In D. Malament (Ed.), Essays in the history and philosophy of science and mathematics (pp. 129-166). Lasalle, Illinois: Open Court.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (pp. 321-333 [http://www.rasch.org/memo1960.pdf]). Berkeley, California: University of California Press.

Rentz, R. R., & Bashaw, W. L. (1977, Summer). The National Reference Scale for Reading: An application of the Rasch model. Journal of Educational Measurement, 14(2), 161-179.

Silva, C. C. (2007, August). The role of models and analogies in the electromagnetic theory: A historical case study. Science & Education, 16(7-8), 835-848.

Stenner, A. J. (2001). The Lexile Framework: A common metric for matching readers and texts. California School Library Journal, 25(1), 41-2.

Stenner, A. J., & Stone, M. (2003). Item specification vs. item banking. Rasch Measurement Transactions, 17(3), 929-30 [http://www.rasch.org/rmt/rmt173a.htm].

Turner, J. (1955, November). Maxwell on the method of physical analogy. British Journal for the Philosophy of Science, 6, 226-238.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

False Modesty and the Progress of Science (or Lack Thereof)

April 5, 2010

In a talk given in 1999, Freeman Dyson, Professor Emeritus at the Institute for Advanced Study in Princeton, New Jersey, proclaimed the stature of James Clerk Maxwell in the history of science, positioning him at the rank of Newton and Einstein. Maxwell’s 1865 theory explaining and unifying the phenomena of electricity and magnetism turned out to be, according to Dyson (1999),

“the prototype for all the great triumphs of twentieth-century physics…the prototype for Einstein’s theories of relativity, for quantum mechanics, for the Yang-Mills theory of generalised gauge invariance, and for the unified theory of fields and particles that is known as the Standard Model of particle physics.”

Maxwell was a leading figure in British science in the period from 1856 until his death at 48 in 1879. He was an academic department head at 25, elected to the Royal Society at 30, was president of the section on mathematical and physical sciences of the British Association for the Advancement of Science at 35, and at 40 became the first Cavendish Professor of Physics at Cambridge, personally overseeing the building of the Cavendish Laboratory.

In addition to his intelligence and imagination, Maxwell had a wry sense of humor, and a rich spiritual life. But in 1870, giving an overview of recent advances in his presidential address to the British Association, he downplayed the importance of what we now know as his landmark 1865 paper on electromagnetism. He instead spoke enthusiastically about William Thomson’s work in electrical theory. Perhaps he did not want to take on the double challenge of trying to explain the new and complex mathematics of his own theory to the physicists, and the physical application of the equations, to the mathematicians. Maybe he thought it would be unfair to take advantage of his position to showcase his own work. But Dyson thinks Maxwell’s colleagues could have been motivated to overcome the difficulties experienced in interpreting the published work if only Maxwell had encouraged them to.

Dyson contends that, in being so “absurdly and infuriatingly modest,” Maxwell set back progress in physics by 20 years, just as Mendel’s monkish isolation held back biology by 50. Referring to his own work toward the end of his address, Maxwell began by saying, “Another theory of electricity which I prefer…”.  He then briefly described his work without taking credit for it.

But what if, as Dyson asks, Maxwell had instead had the confidence of Newton, who, at the start of the third volume of his Principia Mathematica, announced, “I now demonstrate the frame of the system of the world.” What if Maxwell had directly stated the truth with some panache, saying something to the effect of, “I now demonstrate the structure of the models integrating mathematics and physical phenomena that will dominate physics for the foreseeable future, and that will lead to revolutionary advances”? Even if he had not been so grandiose, if someone of his stature in the scientific community, known for his humility and personable nature, had spoken straightforwardly about what he believed to be true, people would have listened, and Freeman Dyson would not have been talking about 20-year delays in the advancement of science brought about by one of its most illustrious contributors.

It would seem that Maxwell’s legacy of self-deprecating modesty might have been inherited by one of his intellectual heirs, Georg Rasch, and the vast majority of those who have adopted Rasch’s measurement models in their research. Rasch explicitly based the mathematics of his approach to psychological measurement on Maxwell’s mathematics (see my previous postings here for more). Rasch accomplished for psychology the same integration of mathematics with substance that Maxwell accomplished for physics. Rasch’s students, Wright, Andrich, Andersen, and Fischer among them, poured passion and insight into developments in models, theory, estimation, software, fit statistics, applications, students, publications, and professional associations for decades. But you would never know that from reading most of the research using his models over the last 30 years, or from taking courses with most of the university professors who purport to apply Rasch’s ideas.

So, all that just to say that there are reasons and purposes motivating these blog postings that may not be readily apparent, but which have their historical precedents and future potentials. There is no more worthy challenge for me, personally, than following Rasch’s lead in figuring out how to demonstrate the frame of the system of the world of social relationships and intangible assets. After all, if no one does this, how many additional decades might be lost before researchers gain the thorough understandings of Rasch’s models that will lead the way to whole new classes of human, scientific, and economic triumphs?

Dyson, F. (1999, July). Why is Maxwell’s theory so hard to understand? In Fourth International Congress Industrial and Applied Mathematics (http://www.clerkmaxwellfoundation.org/DysonFreemanArticle.pdf). Edinburgh, Scotland.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Parameterizing Perfection: Practical Applications of a Mathematical Model of the Lean Ideal

April 2, 2010

To properly pursue perfection, we need to parameterize it. That is, taking perfection as the ideal, unattainable standard against which we judge our performance is equivalent to thinking of it as a mathematical model. Organizations are intended to realize their missions independent of the particular employees, customers, suppliers, challenges, products, etc. they happen to engage with at any particular time. Organizational performance measurement (Spitzer, 2007) ought to then be designed in terms of a model that posits, tests for, and capitalizes on the always imperfectly realized independence of those parameters.

Lean thinking (Womack & Jones, 1996) focuses on minimizing waste and maximizing value. At every point at which resources are invested in processes, services, or products, the question is asked, “What value is added here?” Resources are wasted when no value is added, when they can be removed with no detrimental effect on the value of the end product. In their book, Natural Capitalism: Creating the Next Industrial Revolution, Hawken, Lovins, and Lovins (1999, p. 133) say

“Lean thinking … changes the standard for measuring corporate success. … As they [Womack and Jones] express it: ‘Our earnest advice to lean firms today is simple. To hell with your competitors; compete against perfection by identifying all activities that are muda [the Japanese term for waste used in Toyota’s landmark quality programs] and eliminating them. This is an absolute rather than a relative standard which can provide the essential North Star for any organization.”

Further, every input should “be presumed waste until shown otherwise.” A constant, ongoing, persistent pressure for removing waste is the basic characteristic of lean thinking. Perfection is never achieved, but it aptly serves as the ideal against which progress is measured.

Lean thinking sounds a lot like a mathematical model, though it does not seem to have been written out in a mathematical form, or used as the basis for calibrating instruments, estimating measures, evaluating data quality, or for practical assessments of lean organizational performance. The closest anyone seems to have come to parameterizing perfection is in the work of Genichi Taguchi (Ealey, 1988), which has several close parallels with Rasch measurement (Linacre, 1993).  But meaningful and objective quantification, as required and achieved in the theory and practice of fundamental measurement (Andrich, 2004; Bezruczko, 2005; Bond & Fox 2007; Smith & Smith, 2004; Wilson, 2005; Wright, 1999), in fact asserts abstract ideals of perfection as models of organizational, social, and psychological processes in education, health care, marketing, etc. These models test the extent to which outcomes remain invariant across examination or survey questions, across teachers, students, schools, and curricula, or across treatment methods, business processes, or policies.

Though as yet implemented only to a limited extent in business (Drehmer, Belohlav, James, & Coye, 2000; Drehmer & Deklava, 2001;  Lunz & Linacre, 1998; Salzberger, 2009), advanced measurement’s potential rewards are great. Fundamental measurement theory has been successfully applied in research and practice thousands of times over the last 40 years and more, including in very large scale assessments and licensure/certification applications (Adams, Wu, & Macaskill, 1997; Masters, 2007; Smith, Julian, Lunz, et al., 1994). These successes speak to an opportunity for making broad improvements in outcome measurement that could provide more coherent product definition, and significant associated opportunities for improving product quality and the efficiency with which it is produced, in the manner that has followed from the use of fundamental measures in other industries.

Of course, processes and outcomes are never implemented or obtained with perfect consistency. This would be perfectly true only in a perfect world. But to pursue perfection, we need to parameterize it. In other words, to raise the bar in any area of performance assessment, we have to know not only what direction is up, but we also need to know when we have raised the bar far enough. But we cannot tell up from down, we do not know how much to raise the bar, and we cannot properly evaluate the effects of lean experiments when we have no way of locating measures on a number line that embodies the lean ideal.

To think together collectively in ways that lead to significant new innovations, to rise above what Jaron Lanier calls the “global mush” of confused and self-confirming hive thinking, we need the common languages of widely accepted fundamental measures of the relevant processes and outcomes, measures that remain constant across samples of customers, patients, employees, students, etc., and across products, sales techniques, curricula, treatment processes, assessment methods, and brands of instrument.

We are all well aware that the consequences of not knowing where the bar is, of not having product definitions, can be disastrous. In many respects, as I’ve said previously in this blog, the success or failure of health care reform hinges on getting measurement right. The Institute of Medicine report, To Err is Human, of several years ago stresses the fact that system failures pose the greatest threat to safety in health care because they lead to human errors. When a system as complex as health care lacks a standard product definition, and product delivery is fragmented across multiple providers with different amounts and kinds of information in different settings, the system becomes dangerously cumbersome and over-complicated, with unacceptably wide variations and errors in its processes and outcomes, not to even speak of its economic inefficiency.

In contrast with the widespread use of fundamental measures in the product definitions of other industries, health care researchers typically implement neither the longstanding, repeatedly proven, and mathematically rigorous models of fundamental measurement theory nor the metrological networks through which reference standard metrics are engineered. Most industries carefully define, isolate, and estimate the parameters of their products, doing so in ways 1) that ensure industry-wide comparability and standardization, and 2) that facilitate continuous product improvement by revealing multiple opportunities for enhancement. Where organizations in other industries manage by metrics and thereby keep their eyes on the ball of product quality, health care organizations often manage only their own internal processes and cannot in fact bring the product quality ball into view.

In his message concerning the Institute for Healthcare Improvement’s Pursuing Perfection project a few years ago, Don Berwick, like others (Coye, 2001; Coye & Detmer, 1998), observed that health care does not yet have an organization setting new standards in the way that Toyota did for the auto industry in the 1970s. It still doesn’t, of course. Given the differences between the auto and health care industries uses of fundamental measures of product quality and associated abilities to keep their eyes on the quality ball, is it any wonder then, that no one in health care has yet hit a home run? It may well be that no one will hit a home run in health care until reference standard measures of product quality are devised.

The need for reference standard measures in uniform data systems is crucial, and the methods for obtaining them are widely available and well-known. So what is preventing the health care industry from adopting and deploying them? Part of the answer is the cost of the initial investment required. In 1980, metrology comprised about six percent of the U.S. gross national product (Hunter, 1980). In the period from 1981 to 1994, annual expenditures on research and development in the U.S. were less than three percent of the GNP, and non-defense R&D was about two percent (NIST Subcommittee on Research, National Science and Technology Council, 1996). These costs, however, must be viewed as investments from which high rates of return can be obtained (Barber, 1987; Gallaher, Rowe, Rogozhin, et al., 2007; Swann, 2005).

For instance, the U.S. National Institute of Standards and Technology estimated the economic impact of 12 areas of research in metrology, in four broad areas including semiconductors, electrical calibration and testing, optical industries, and computer systems (NIST, 1996, Appendix C; also see NIST, 2003). The median rate of return in these 12 areas was 147 percent, and returns ranged from 41 to 428 percent. The report notes that these results compare favorably with those obtained in similar studies of return rates from other public and private research and development efforts. Even if health care metrology produces only a small fraction of the return rate produced in physical metrology, its economic impact could still amount to billions of dollars annually. The proposed pilot projects therefore focus on determining what an effective health care outcomes metrology system should look like. What should its primary functions be? What should it cost? What rates of return could be expected from it?

Metrology, the science of measurement (Pennella, 1997), requires 1) that instruments be calibrated within individual laboratories so as to isolate and estimate the values of the required parameters (Wernimont, 1978); and 2) that individual instruments’ capacities to provide the same measure for the same amount, and so be traceable to a reference standard, be established and monitored via interlaboratory round-robin trials (Mandel, 1978).

Fundamental measurement has already succeeded in demonstrating the viability of reference standard measures of health outcomes, measures whose meaningfulness does not depend on the particular samples of items employed or patients measured. Though this work succeeds as far as it goes, it being done in a context that lacks any sense of the need for metrological infrastructure. Health care needs networks of scientists and technicians collaborating not only in the first, intralaboratory phase of metrological work, but also in the interlaboratory trials through which different brands or configurations of instruments intended to measure the same variable would be tuned to harmoniously produce the same measure for the same amount.

Implementation of the two phases of metrological innovation in health care would then begin with the intralaboratory calibration of existing and new instruments for measuring overall organizational performance, quality of care, and patients’ health status, quality of life, functionality, etc.  The second phase takes up the interlaboratory equating of these instruments, and the concomitant deployment of reference standard units of measurement throughout a health care system and the industry as a whole. To answer questions concerning health care metrology’s potential returns on investment, the costs for, and the savings accrued from, accomplishing each phase of each pilot will be tracked or estimated.

When instruments measuring in universally uniform, meaningful units are put in the hands of clinicians, a new scientific revolution will occur in medicine. It will be analogous to previous ones associated with the introduction of the thermometer and the instruments of optometry and the clinical laboratory. Such tools will multiply many times over the quality improvement methods used by Brent James, touted as holding the key to health care reform in a recent New York Times profile. Instead of implicitly hypothesizing models of perfection and assessing performance relative to them informally, what we need is a new science that systematically implements the lean ideal on industry-wide scales. The future belongs to those who master these techniques.

References

Adams, R. J., Wu, M. L., & Macaskill, G. (1997). Scaling methodology and procedures for the mathematics and science scales. In M. O. Martin & D. L. Kelly (Eds.), Third International Mathematics and Science Study Technical Report: Vol. 2: Implementation and Analysis – Primary and Middle School Years (pp. 111-145). Chestnut Hill, MA: Boston College.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Barber, J. M. (1987). Economic rationale for government funding of work on measurement standards. In R. Dobbie, J. Darrell, K. Poulter & R. Hobbs (Eds.), Review of DTI work on measurement standards (p. Annex 5). London: Department of Trade and Industry.

Berwick, D. M., James, B., & Coye, M. J. (2003, January). Connections between quality measurement and improvement. Medical Care, 41(1 (Suppl)), I30-38.

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Coye, M. J. (2001, November/December). No Toyotas in health care: Why medical care has not evolved to meet patients’ needs. Health Affairs, 20(6), 44-56.

Coye, M. J., & Detmer, D. E. (1998). Quality at a crossroads. The Milbank Quarterly, 76(4), 759-68.

Drehmer, D. E., Belohlav, J. A., & Coye, R. W. (2000, Dec). A exploration of employee participation using a scaling approach. Group & Organization Management, 25(4), 397-418.

Drehmer, D. E., & Deklava, S. M. (2001, April). A note on the evolution of software engineering practices. Journal of Systems and Software, 57(1), 1-7.

Ealey, L. A. (1988). Quality by design: Taguchi methods and U.S. industry. Dearborn MI: ASI Press.

Gallaher, M. P., Rowe, B. R., Rogozhin, A. V., Houghton, S. A., Davis, J. L., Lamvik, M. K., et al. (2007). Economic impact of measurement in the semiconductor industry (Tech. Rep. No. 07-2). Gaithersburg, MD: National Institute for Standards and Technology.

Hawken, P., Lovins, A., & Lovins, H. L. (1999). Natural capitalism: Creating the next industrial revolution. New York: Little, Brown, and Co.

Hunter, J. S. (1980, November). The national system of scientific measurement. Science, 210(21), 869-874.

Linacre, J. M. (1993). Quality by design: Taguchi and Rasch. Rasch Measurement Transactions, 7(2), 292.

Lunz, M. E., & Linacre, J. M. (1998). Measurement designs using multifacet Rasch modeling. In G. A. Marcoulides (Ed.), Modern methods for business research. Methodology for business and management (pp. 47-77). Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Mandel, J. (1978, December). Interlaboratory testing. ASTM Standardization News, 6, 11-12.

Masters, G. N. (2007). Special issue: Programme for International Student Assessment (PISA). Journal of Applied Measurement, 8(3), 235-335.

National Institute for Standards and Technology (NIST). (1996). Appendix C: Assessment examples. Economic impacts of research in metrology. In C. o. F. S. Subcommittee on Research (Ed.), Assessing fundamental science: A report from the Subcommittee on Research, Committee on Fundamental Science. Washington, DC: National Standards and Technology Council [http://www.nsf.gov/statistics/ostp/assess/nstcafsk.htm#Topic%207; last accessed 18 February 2008].

National Institute for Standards and Technology (NIST). (2003, 15 January). Outputs and outcomes of NIST laboratory research. Retrieved 12 July 2009, from http://www.nist.gov/director/planning/studies.htm#measures.

Pennella, C. R. (1997). Managing the metrology system. Milwaukee, WI: ASQ Quality Press.\

Salzberger, T. (2009). Measurement in marketing research: An alternative framework. Northampton, MA: Edward Elgar.

Smith, R. M., Julian, E., Lunz, M., Stahl, J., Schulz, M., & Wright, B. D. (1994). Applications of conjoint measurement in admission and professional certification programs. International Journal of Educational Research, 21(6), 653-664.

Smith, E. V., Jr., & Smith, R. M. (2004). Introduction to Rasch measurement. Maple Grove, MN: JAM Press.

Spitzer, D. (2007). Transforming performance measurement: Rethinking the way we measure and drive organizational success. New York: AMACOM.

Swann, G. M. P. (2005, 2 December). John Barber’s pioneering work on the economics of measurement standards [Electronic version]. Retrieved http://www.cric.ac.uk/cric/events/jbarber/swann.pdf from Notes for Workshop in Honor of John Barber held at University of Manchester.

Wernimont, G. (1978, December). Careful intralaboratory study must come first. ASTM Standardization News, 6, 11-12.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Womack, J. P., & Jones, D. T. (1996, Sept./Oct.). Beyond Toyota: How to root out waste and pursue perfection. Harvard Business Review, 74, 140-58.

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Leadership, Social Capital, and an Ethics of Transparent Representation

April 1, 2010

Leadership and innovation are always asserted by individuals and small groups whose influence is conditional on a lot of different factors. Having the Internet hardware and wiring in place for the global nervous system sets the stage for the evolutionary emergence of the global cerebellum. This more fully evolved, complex adaptive system of distributed cognition will transform the flood of data into information, knowledge, understanding, and, hopefully, wisdom. As the technical viability, social desirability, and economic profitability of living capital standards and markets become increasingly apparent to innovators and entrepreneurs, the metric infrastructure for lower transaction costs will organically emerge as a natural process, first within firms and local communities, then within industries and regions, and then nationally and internationally.

Today’s political, regulatory, and business failings seem to me to be functions of social capital market inefficiencies. When any individual’s, firm’s, or government’s stock of social capital is routinely measured and traded in public markets, and public opinion gets solidly behind the economics of Genuine Progress Indicators and Happiness Indexes, we’ll be in a better position to detect and prevent the kinds of failings and abuses we’re currently suffering and trying to recover from.

Taking up the ethical question raised by Matt in his comment on the previous posting here, at root, all violence is the violence of the premature conclusion (Ricoeur, 1967/1974). When prejudices, hegemonic agendas, fear, impatience, greed, etc. dominate relationships, we leap to unjustified conclusions and unfairly reduce others to caricatures of what they really are. This kind of reductionism and imposition of power is a choice in favor of violence over discourse. But if we choose discourse and the possibility of keeping the conversation going, then we have to stay open to seeking better representations of others and their positions. Of course, though we can remain open in principle to new information, we often have to make a decision at some point. And so things are complicated by those who stop short of physical violence, but who then shut down the conversation in various ways, preventing others from representing their social, economic, and political interests under the uniform rule of law.

So, recognizing that our judgments are always provisional and that no sample of evidence is ever absolutely conclusive, we have to be able to tell when our evidence is sufficient to the task of representing where someone stands. And as I’ve said here before, unjustified reductionism is abhorent, but reduction is inevitable, since no discourse, no text, and no inquiry of limited length and duration can ever fully represent a potentially infinite universe of all possible aspects of another’s being. This seems to me to be part of what Levinas (1969) is getting at in his ethics of totality vs infinity (also see Cohen & Marsh, 2002).

I love the way Ricoeur (1974, pp. 96-97) says that

This is why all philosophies are particular even though everything is to be found in any great philosophy. And as I am myself one of the violent particularities, it is from my particular point of view that I perceive all these total particularities that are also particular totalities. The hard road of the ‘loving struggle’ is the only road possible.

In accepting the inevitability of reduction and the associated inevitability of some loss or excess of meaning, we need to learn how to escape the fundamentalist rigidity of final conclusions without falling into the relativist laxity of no standards whatsoever.

A systematic implementation of an ethical choice in favor of discourse over violence must, then, have some structural means of keeping the question open. Many of our systems, of course, are already structured in this way, though in as yet incomplete manners. For instance, in education, children are tested in each subject using representative samples of items that are in no way intended to actually be the entire universe of tasks or challenges the children could likely successfully address. We need to follow through on this intention by providing the structure of a common language capable of relating each child’s performance on any given set of items to the universe of all possible items. This is what the Lexile scale (www.Lexile.com) does for reading, for instance.

A structural means of keeping the question open would be a framework in which new evidence could be incorporated without compromising the integrity of what came before. The need is for a linguistic economy in which the market of ideas is defined so as to allow different particular forms of human, social, and natural capital to be treated as though they are equivalent, reducing them without being reductionistic, metaphorically calling them the same, and having good reasons for doing so, even as we understand they are not.

As Plato showed in the Phaedo, language is inherently already such an economy (Ballard, 1978, pp. 186-190). To refer to two different people as people, as women, as teachers, as Chicagoans, or even just as two is to overlook everything that makes them unique in favor of something they have in common. We compensate for this reduction by recognizing each individual’s unique combination of different particularities, but each particularity is in some way connected with an impersonal universal insofar as it is put into words.

Science and capitalism are inherently already extensions of this linguistic economy. Concepts are the original universal metrics. Laboratory and market measurement methods and instruments of objective and equitable comparison emerged of their own accord, organically, as the conversation that we are unfolded.

What we need to do is deliberately and scientifically extend the linguistic economy yet again. In one sense it will happen to us in its own time, whether we will it or not. But there is another sense in which this addition of a new paragraph in the ongoing footnote to Plato will be written in a far more conscious and overtly intended way than any of the previous ones were.

We can arrive at this place only by letting things be what they are, by entering into dialogues with each other and things in the world in ways that allow them the opportunity to assert their own independence as forms of life. This is ultimately what objective measurement is all about. A basis for measurement and a provisionally acceptable reduction of an infinite potentiality is established when a construct repeatedly shows itself as something that is repeatedly identifiable across questions asked and people answering, or across criteria and behaviors observed.

We leave ourselves open to the possible refutation of the independence of the form of life, or of any particular representation of it, by continuing to check the consistency of new observations. And new observations can be provided using new questions or criteria applied to new samples of respondents, examinees, or behaviors. Anomalies and consistent inconsistencies will demand explanation and will entail the assertion of new constructs or populations, the correction of errors, the forgiveness of careless mistakes, or the acceptance of special strengths. Individual interests in accuracy and precision, or the lack thereof, will be both augmented by and challenged by the scientific capacity to reproduce the results obtained. A transformation of research, regulation, markets, watchdogs, and more is in the offing.

And so, what I mean, of course, by a structural means of systematically keeping the question open is akin to an item bank in which new questions are calibrated so they take their positions on the scale without changing the positions and values of existing items. This has long been a routinely implemented technicality in computer adaptive testing and item banking (Choppin, 1968, 1976; Wright & Bell, 1984; Lunz, Bergstrom, & Gershon, 1994). But in the dominant conceptualizations of social measurement methods, adding items changes the meaning of the scores that are mistakenly treated as measures, and so the violence of the premature conclusion is enacted as a matter of course simply as a way of maintaining some semblance of a common metric and frame of reference.

The hermeneutic persistence in pursuing what is questionable demands a framework in which new evidence can be evaluated for its consistency with the integrity of what came before. Further, it must be possible to incorporate new data and new questions within the framework of an existing language’s words and concepts—except when that framework is no longer sufficient. Sufficiency has to be taken seriously as an explicitly evaluated and necessary part of reduction (Fisher, 2010). The ontological method’s process of reduction, application, deconstruction, and return to a new reduction is already the unrecognized, implicit norm of historically effective consciousness. is one in which items are calibrated in a bank. Recognizing this method and systematically incorporating it into theory and practice is the challenge of our day.

Reiterating yet again the French revolutionaries’ association of universal human rights and universal metrics, what we want are systematically institutionalized ways of recognizing ourselves in each other, in finding meaningful commonalities that do not force us into rigid sameness. We want measures of our abilities, health, performances, trustworthiness, and environmental quality that we all can understand and use, that we value for their fair and equitable representation of who we are, that make life better for all of us by giving credit where it is due and by illuminating proven paths to growth and advancement.

Alder (2002, p. 2) says it well: “the use a society makes of its measures expresses its sense of fair dealing. That is why the balance scale is a widespread symbol of justice. … Our methods of measurement define who we are and what we value.” And so, as I say, we are what we measure. It’s time we measured what we want to be.

References

Ballard, E. G. (1978). Man and technology: Toward the measurement of a culture. Pittsburgh, Pennsylvania: Duquesne University Press.

Choppin, B. (1968). An item bank using sample-free calibration. Nature, 219, 870-872.

Choppin, B. (1976). Recent developments in item banking. In D. N. M. DeGruitjer & L. J. van der Kamp (Eds.), Advances in Psychological and Educational Measurement (pp. 233-245). New York: Wiley.

Cohen, R. A., & Marsh, J. L. (Eds.). (2002). Ricoeur as another: The ethics of subjectivity (L. Langsdorf, Ed.). SUNY Series in the Philosophy of the Social Sciences. Albany, New York: State University of New York Press.

Fisher, W. P., Jr. (2010). Reducible or irreducible? Mathematical reasoning and the ontological method. Journal of Applied Measurement, 11(1), 38-59.

Levinas, E. (1969). Totality and infinity (A. Lingis, Trans.). Pittsburgh, PA: Duquesne University Press.

Lunz, M. E., Bergstrom, B. A., & Gershon, R. C. (1994). Computer adaptive testing. International Journal of Educational Research, 21(6), 623-634.

Ricoeur, P. (1967). Violence et langage (J. Bien, Trans.). Recherches et Debats: La Violence, 59, 86-94. (Rpt. in D. Stewart & J. Bien, (Eds.). (1974). Violence and language, in Political and social essays by Paul Ricoeur (pp. 88-101). Athens, OH: Ohio University Press.)

Wright, B. D., & Bell, S. R. (1984, Winter). Item banks: What, why, how. Journal of Educational Measurement, 21(4), 331-345 [http://www.rasch.org/memo43.htm].