Posts Tagged ‘analogy’

Measuring Values To Apply The Golden Rule

December 29, 2016

Paper presentation 45.20, American Educational Research Association

New Orleans, April 1994



Basing her comments on the writings of Michael Lerner in Tikkun magazine, “Hillary Rodham Clinton speaks appealingly of a political morality based on the Golden Rule,” says Chicago Tribune columnist Clarence Page.  Lerner and Clinton are correct in asserting that we need to rediscover and re-invigorate our spiritual values, though there is nothing new in this assertion, and Page is correct in his opinion that conservative columnists who say religion is spirituality, and that there is therefore nothing in need of re-invigoration, are wrong.  Research on the spiritual dimension of disability, for instance, shows that the quality of spiritual experience has little, if anything, to do with religious church attendance, bible reading, prayer, or the taking of sacraments (Fisher & Pugliese, 1989).

The purpose of this paper is to propose a research program that would begin to prepare the ground in which a political morality based on the Golden Rule might be cultivated.

Theoretical Framework

Implementing a “political morality based on the Golden Rule” requires some way of knowing that what I do unto others is the same as what I would have done unto me. To know this, I need a measuring system that keeps things in proportion by showing what counts as the same thing for different people.  A political morality based on the Golden Rule has got to have some way of identifying when a service or action done unto others is the same as the one done unto me.  In short, application of the Golden Rule requires an empirical basis of comparison, a measuring system that sets up analogies between people’s values and what is valued.  We must be able to say that my values are to one aspect of a situation what yours are to that or another aspect, and that proportions of this kind hold constant no matter which particular persons are addressed and no matter which aspects of the situation are involved.


Is it possible to measure what people value—politically, socially, economically, spiritually, and culturally—in a way that embodies the Golden Rule? If so, could such a measure be used for realizing the political morality Hillary Rodham Clinton has advocated?  L. L. Thurstone presented methods for successfully revealing the necessary proportions in the 1920s; these were improved upon by the Danish mathematician Georg Rasch in the 1950s.  Thurstone’s and Rasch’s ideas are researched and applied today by Benjamin D. Wright and J. Michael Linacre.  These and other thinkers hold that measurement takes place only when application of the Golden Rule is possible.  That is, measurement is achieved only if someone’s measure does not depend on who is in the group she is measured with, on the particular questions answered or not answered, on who made the measure, on the brand name of the instrument, or on where the measure took place.

Measurement of this high quality is called scale-free because its quantities do not vary according to the particular questions asked (as long as they pertain to the construct of interest); neither do they vary according to the structure or combination of the particular rating scheme(s) employed (rating scale, partial credit, correct/incorrect, true/false, present/absent, involvement of judges, paired comparisons, etc.), or the brand name of the instrument measuring.  All of these requirements must hold if I am to treat a person as I would like to be treated, because if they do not hold, I do not know enough about her values or mine to say whether she’s receiving the treatment I’d prefer in the same circumstance.

In order to make the Golden Rule the basis of a political morality, we need to improve the quality of measurement in every sphere of our lives; after all, politics is more than just what politicians do, it is a basic part of community life.  Even though the technology and methods for high quality measurement in education, sociology, and psychology have existed for decades, researchers have been indifferent to their use.

That indifference may be near an end.  If people get serious about applying the Golden Rule, they are going to come up against a need for rigorous quantitative measurement.  We need to let them know that the tools for the job are available.

Data sources

Miller’s Scale Battery of International Patterns and Norms (SBIPN) (Miller, 1968, 1970, 1973), described in Miller (1983, pp. 462-468), is an instrument that presents possibilities for investigating quantitative relations among value systems.  The instrument is composed of 20 six-point rating scale items involving such cultural norms and patterns as social acceptance, family solidarity, trustfulness, moral code, honesty, reciprocity, class structure, etc.  Each pair of rating scale points (1-2, 3-4, 5-6) is associated with a 15-30 word description; raters judge national values by assigning ratings, where 1 indicates the most acceptance, solidarity, trust, morality, etc., and 6 the least.  Miller (1983, p. 462) reports test-retest correlations of .74 to .97 for the original 15 items on the survey as testing in the United States and Peru.  Validity claims are based on the scale’s ability to distinguish between values of citizens of the United States and Peru, with supporting research comparing values in Argentina, Spain, England, and the United States.

The SBIPN could probably be improved in several ways.  First, individual countries contain so many diverse ethnic groups and subcultures whose value systems are often in conflict that ratings should probably be made of them and not of the entire population.  The geographical location of the ethnic group or subculture rated should also be tracked in order to study regional variations.  Second, Miller contends that raters must have a college degree to be qualified as a SBIPN judge; the complexity of his rating procedure justifies this claim.  In order to simplify the survey and broaden the base of qualified judges, the three groups of short phrases structuring each six-point rating scale should be used as individual items rated on a frequency continuum.

For instance, the following phrases appear in association with ratings of 1 and 2 under social acceptance:

high social acceptance. Social contacts open and nonrestrictive. Introductions not needed for social contacts.  Short acquaintance provides entry into the home and social organizations.

Similar descriptions are associated with the 3-4 (medium social acceptance) and 5-6 (low social acceptance) rating pairs; only one rating from the series of six is assigned, so that a rating of 1 or 2 is assigned only if the judgment is of high social acceptance.  Instead of asking the rater to assign one of two ratings to all six of these statements (breaking apart the two conjunctive phrases), and ignoring the 10-20 phrases associated with the other four rating scale points, each phrase presented on the six-point continuum should be rated separately for the frequency of the indicated pattern or norm.  A four-point rating scale (Almost Always, Frequently, Sometimes, Rarely) should suffice.

Linacre’s (1993, p. 284) graphical presentation of Rasch-based Generalizability Theory indicates that reliability and separation statistics of .92 and 3.4, respectively, can be expected for a 20-item, six-point rating scale survey (Miller’s original format), assuming a measurement standard deviation of one logit.  360 items will be produced if each of the original 20 six-point items can be transformed into 18 four-point items (following the above example’s derivation of six items from one of the three blocks of one item’s descriptive phrases).  If only 250 of these items work to support the measurement effort, Linacre’s graph shows that a reliability of .99 and separation of 10 might be obtained, again assuming a measurement standard deviation of one logit.  Since not all of the survey’s items would probably be administered at once, these estimates are probably high.  The increased number of items, however, would be advantageous for use as an item bank in a computer adapted administration of the survey.

Expected results

Miller’s applications of the SBIPN provide specific indications of what might be expected from the revised form of the survey.  Family solidarity tends to be low, labor assimilated into the prevailing economic system, class consciousness devalued, and moral conduct secularly defined in the United States, in opposition to Colombia and Peru, where family solidarity is high, labor is antagonistic to the prevailing economic system, class structure is rigidly defined, and moral conduct is religiously defined.  At the other extreme, civic participation, work and achievement, societal consensus, children’s independence, and democracy are highly valued in the United States, but considerably less so in Colombia and Peru.

Miller’s presentation of the survey results will be improved on in several ways.  First, construct validity will be examined in terms of the data’s internal consistency (fit analysis) and the conceptual structure delineated by the items.  Second, the definition of interval measurement continua for each ethnic group or subculture measured will facilitate quantitative and qualitative comparisons of each group’s self-image with its public image.  Differences in group perception can be used for critical self-evaluation as well as information crucial for rectifying unjust projections of prejudice.

Scientific importance

One of the most important benefits of this survey could be the opportunity to show that, although different value systems vary in their standards of what counts as acceptable behaviors and attitudes, the procedures by which values are calibrated and people’s personal values are measured do not vary.  That this should turn out to be the case will make it more difficult to justify and maintain hostile prejudices against others whose value systems differ from one’s own.  If people who do not share my values cannot immediately be categorized as godless, heathens, infidels, pagans, unwashed, etc., ie, in the category of the non-classifiable, then I should be less prone to disregard, hate, or fear them, and more able to build a cohesive, healthy, and integrated community with them.

The cultural prejudice structuring this proposal is that increased understanding of others’ values is good; that this prejudice needs to be made explicit and evaluated for its effect on those who do not share it is of great importance.  The possibility of pursuing a quantitative study of value systems may strike some as an area of research that could only be used to dominate and oppress those who do not have the power to defend themselves.  This observation implies that one reason why more rigorous scientific measurement procedures have failed to take hold in the social studies may be because we have unspoken, but nonetheless justifiable, reservations concerning our capacity to employ high quality information responsibly.  Knowledge is inherently dangerous, but a political morality based on the Golden Rule will require nothing less than taking another bite of the apple from the Tree of Knowledge.



Fisher, William P. & Karen Pugliese. 1989.  Measuring the importance of pastoral care in rehabilitation. Archives of Physical Medicine and Rehabilitation, 70, A-22 [Abstract].

Linacre, J. Michael. 1993. Rasch-based generalizability theory. Rasch Measurement, 7: 283-284.

Miller, Delbert C. 1968. The measurement of international patterns and norms: A tool for comparative research. Southwestern Social Science Quarterly, 48: 531-547.

Miller, Delbert C. 1970. International Community Power Structures: Comparative Studies of Four World Cities. Bloomington: Indiana University Press.

Miller, Delbert C. 1972. Measuring cross national norms: Methodological problems in identifying patterns in Latin America and Anglo-Saxon Cultures.  International Journal of Comparative Sociology, 13(3-4): 201-216.

Miller, Delbert C. 1983. Handbook of Research Design and Social Measurement. 4th ed. New York: Longman.


Newton, Metaphysics, and Measurement

January 20, 2011

Though Newton claimed to deduce quantitative propositions from phenomena, the record shows that he brought a whole cartload of presuppositions to bear on his observations (White, 1997), such as his belief that Pythagoras was the discoverer of the inverse square law, his knowledge of Galileo’s freefall experiments, and his theological and astrological beliefs in occult actions at a distance. Without his immersion in this intellectual environment, he likely would not have been able to then contrive the appearance of deducing quantity from phenomena.

The second edition of the Principia, in which appears the phrase “hypotheses non fingo,” was brought out in part to respond to the charge that Newton had not offered any explanation of what gravity is. De Morgan, in particular, felt that Newton seemed to know more than he could prove (Keynes, 1946). But in his response to the critics, and in asserting that he feigns no hypotheses, Newton was making an important distinction between explaining the causes or composition of gravity and describing how it works. Newton was saying he did not rely on or make or test any hypotheses as to what gravity is; his only concern was with how it behaves. In due course, gravity came to be accepted as a fundamental feature of the universe in no need of explanation.

Heidegger (1977, p. 121) contends that Newton was, as is implied in the translation “I do not feign hypotheses,” saying in effect that the ground plan he was offering as a basis for experiment and practical application was not something he just made up. Despite Newton’s rejection of metaphysical explanations, the charge of not explaining gravity for what it is was being answered with a metaphysics of how, first, to derive the foundation for a science of precise predictive control from nature, and then resituate that foundation back within nature as an experimental method incorporating a mathematical plan or model. This was, of course, quite astute of Newton, as far as he went, but he stopped far short of articulating the background assumptions informing his methods.

Newton’s desire for a logic of experimental science led him to reject anything “metaphysical or physical, or based on occult qualities, or mechanical” as a foundation for proceeding. Following in Descartes’ wake, Newton then was satisfied to solidify the subject-object duality and to move forward on the basis of objective results that seemed to make metaphysics a thing of the past. Unfortunately, as Burtt (1954/1932, pp. 225-230) observes in this context, the only thing that can possibly happen when you presume discourse to be devoid of metaphysical assumptions is that your metaphysics is more subtly insinuated and communicated to others because it is not overtly presented and defended. Thus we have the history of logical positivism as the dominant philosophy of science.

It is relevant to recall here that Newton was known for strong and accurate intuitions, and strong and unorthodox religious views (he held the Lucasian Chair at Cambridge only by royal dispensation, as he was not Anglican). It must be kept in mind that Newton’s combination of personal characteristics was situated in the social context of the emerging scientific culture’s increasing tendency to prioritize results that could be objectively detached from the particular people, equipment, samples, etc. involved in their production (Shapin, 1989). Newton then had insights that, while remarkably accurate, could not be entirely derived from the evidence he offered and that, moreover, could not acceptably be explained informally, psychologically, or theologically.

What is absolutely fascinating about this constellation of factors is that it became a model for the conduct of science. Of course, Newton’s laws of motion were adopted as the hallmark of successful scientific modeling in the form of the Standard Model applied throughout physics in the nineteenth century (Heilbron, 1993). But so was the metaphysical positivist logic of a pure objectivism detached from everything personal, intuitive, metaphorical, social, economic, or religious (Burtt, 1954/1932).

Kuhn (1970) made a major contribution to dismantling this logic when he contrasted textbook presentations of the methodical production of scientific effects with the actual processes of cobbled-together fits and starts that are lived out in the work of practicing scientists. But much earlier, James Clerk Maxwell (1879, pp. 162-163) had made exactly the same observation in a contrast of the work of Ampere with that of Faraday:

“The experimental investigation by which Ampere established the laws of the mechanical action between electric currents is one of the most brilliant achievements in science. The whole, theory and experiment, seems as if it had leaped, full grown and full armed, from the brain of the ‘Newton of electricity.’ It is perfect in form, and unassailable in accuracy, and it is summed up in a formula from which all the phenomena may be deduced, and which must always remain the cardinal formula of electro-dynamics.

“The method of Ampere, however, though cast into an inductive form, does not allow us to trace the formation of the ideas which guided it. We can scarcely believe that Ampere really discovered the law of action by means of the experiments which he describes. We are led to suspect, what, indeed, he tells us himself* [Ampere’s Theorie…, p. 9], that he discovered the law by some process which he has not shewn us, and that when he had afterwards built up a perfect demonstration he removed all traces of the scaffolding by which he had raised it.

“Faraday, on the other hand, shews us his unsuccessful as well as his successful experiments, and his crude ideas as well as his developed ones, and the reader, however inferior to him in inductive power, feels sympathy even more than admiration, and is tempted to believe that, if he had the opportunity, he too would be a discoverer. Every student therefore should read Ampere’s research as a splendid example of scientific style in the statement of a discovery, but he should also study Faraday for the cultivation of a scientific spirit, by means of the action and reaction which will take place between newly discovered facts and nascent ideas in his own mind.”

Where does this leave us? In sum, Rasch emulated Ampere in two ways. He did so first in wanting to become the “Newton of reading,” or even the “Newton of psychosocial constructs,” when he sought to show that data from reading test items and readers are structured with an invariance analogous to that of data from instruments applying a force to an object with mass (Rasch, 1960, pp. 110-115). Rasch emulated Ampere again when, like Ampere, after building up a perfect demonstration of a reading law structured in the form of Newton’s second law, he did not report the means by which he had constructed test items capable of producing the data fitting the model, effectively removing all traces of the scaffolding.

The scaffolding has been reconstructed for reading (Stenner, et al., 2006) and has also been left in plain view by others doing analogous work involving other constructs (cognitive and moral development, mathematics ability, short-term memory, etc.). Dawson (2002), for instance, compares developmental scoring systems of varying sophistication and predictive control. And it may turn out that the plethora of uncritically applied Rasch analyses may turn out to be a capital resource for researchers interested in focusing on possible universal laws, predictive theories, and uniform metrics.

That is, published reports of calibration, error, and fit estimates open up opportunities for “pseudo-equating” (Beltyukova, Stone, & Fox, 2004; Fisher 1997, 1999) in their documentation of the invariance, or lack thereof, of constructs over samples and instruments. The evidence will point to a need for theoretical and metric unification directly analogous to what happened in the study and use of electricity in the nineteenth century:

“…’the existence of quantitative correlations between the various forms of energy, imposes upon men of science the duty of bringing all kinds of physical quantity to one common scale of comparison.’” [Schaffer, 1992, p. 26; quoting Everett 1881; see Smith & Wise 1989, pp. 684-4]

Qualitative and quantitative correlations in scaling results converged on a common construct in the domain of reading measurement through the 1960s and 1970s, culminating in the Anchor Test Study and the calibration of the National Reference Scale for Reading (Jaeger, 1973; Rentz & Bashaw, 1977). The lack of a predictive theory and the entirely empirical nature of the scale estimates prevented the scale from wide application, as the items in the tests that were equated were soon replaced with new items.

But the broad scale of the invariance observed across tests and readers suggests that some mechanism must be at work (Stenner, Stone, & Burdick, 2009), or that some form of life must be at play (Fisher, 2003a, 2003b, 2004, 2010a), structuring the data. Eventually, some explanation accounting for the structure ought to become apparent, as it did for reading (Stenner, Smith, & Burdick, 1983; Stenner, et al., 2006). This emergence of self-organizing structures repeatedly asserting themselves as independently existing real things is the medium of the message we need to hear. That message is that instruments play a very large and widely unrecognized role in science. By facilitating the routine production of mutually consistent, regularly observable, and comparable results they set the stage for theorizing, the emergence of consensus on what’s what, and uniform metrics (Daston & Galison, 2007; Hankins & Silverman, 1999; Latour, 1987, 2005; Wise, 1988, 1995). The form of Rasch’s models as extensions of Maxwell’s method of analogy (Fisher, 2010b) makes them particularly productive as a means of providing self-organizing invariances with a medium for their self-inscription. But that’s a story for another day.


Beltyukova, S. A., Stone, G. E., & Fox, C. M. (2004). Equating student satisfaction measures. Journal of Applied Measurement, 5(1), 62-9.

Burtt, E. A. (1954/1932). The metaphysical foundations of modern physical science (Rev. ed.) [First edition published in 1924]. Garden City, New York: Doubleday Anchor.

Daston, L., & Galison, P. (2007). Objectivity. Cambridge, MA: MIT Press.

Dawson, T. L. (2002, Summer). A comparison of three developmental stage scoring systems. Journal of Applied Measurement, 3(2), 146-89.

Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Jr. (1999). Foundations for health status metrology: The stability of MOS SF-36 PF-10 calibrations across samples. Journal of the Louisiana State Medical Society, 151(11), 566-578.

Fisher, W. P., Jr. (2003a, December). Mathematics, measurement, metaphor, metaphysics: Part I. Implications for method in postmodern science. Theory & Psychology, 13(6), 753-90.

Fisher, W. P., Jr. (2003b, December). Mathematics, measurement, metaphor, metaphysics: Part II. Accounting for Galileo’s “fateful omission.” Theory & Psychology, 13(6), 791-828.

Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.

Fisher, W. P., Jr. (2010a). Reducible or irreducible? Mathematical reasoning and the ontological method. Journal of Applied Measurement, 11(1), 38-59.

Fisher, W. P., Jr. (2010b). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics: Conference Series, 238(1),

Hankins, T. L., & Silverman, R. J. (1999). Instruments and the imagination. Princeton, New Jersey: Princeton University Press.

Jaeger, R. M. (1973). The national test equating study in reading (The Anchor Test Study). Measurement in Education, 4, 1-8.

Keynes, J. M. (1946, July). Newton, the man. (Speech given at the Celebration of the Tercentenary of Newton’s birth in 1642.) MacMillan St. Martin’s Press (London, England), The Collected Writings of John Maynard Keynes Volume X, 363-364.

Kuhn, T. S. (1970). The structure of scientific revolutions. Chicago, Illinois: University of Chicago Press.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York: Cambridge University Press.

Latour, B. (2005). Reassembling the social: An introduction to Actor-Network-Theory. (Clarendon Lectures in Management Studies). Oxford, England: Oxford University Press.

Maxwell, J. C. (1879). Treatise on electricity and magnetism, Volumes I and II. London, England: Macmillan.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Rentz, R. R., & Bashaw, W. L. (1977, Summer). The National Reference Scale for Reading: An application of the Rasch model. Journal of Educational Measurement, 14(2), 161-179.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Shapin, S. (1989, November-December). The invisible technician. American Scientist, 77, 554-563.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Stenner, A. J., Smith, M., III, & Burdick, D. S. (1983, Winter). Toward a theory of construct definition. Journal of Educational Measurement, 20(4), 305-316.

Stenner, A. J., Stone, M., & Burdick, D. (2009, Autumn). The concept of a measurement mechanism. Rasch Measurement Transactions, 23(2), 1204-1206.

White, M. (1997). Isaac Newton: The last sorcerer. New York: Basic Books.

Wise, M. N. (1988). Mediating machines. Science in Context, 2(1), 77-113.

Wise, M. N. (Ed.). (1995). The values of precision. Princeton, New Jersey: Princeton University Press.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at
Permissions beyond the scope of this license may be available at