Posts Tagged ‘theory’

A New Agenda for Measurement Theory and Practice in Education and Health Care

April 15, 2011

Two key issues on my agenda offer different answers to the question “Why do you do things the way you do in measurement theory and practice?”

First, we can take up the “Because of…” answer to this question. We need to articulate an historical account of measurement that does three things:

  1. that builds on Rasch’s use of Maxwell’s method of analogy by employing it and expanding on it in new applications;
  2. that unifies the vocabulary and concepts of measurement across the sciences into a single framework so far as possible by situating probabilistic models of invariant individual-level within-variable phenomena in the context of measurement’s GIGO principle and data-to-model fit, as distinct from the interactions of group-level between-variable phenomena in the context of statistics’ model-to-data fit; and
  3. that stresses the social, collective cognition facilitated by networks of individuals whose point-of-use measurement-informed decisions and behaviors are coordinated and harmonized virtually, at a distance, with no need for communication or negotiation.

We need multiple publications in leading journals on these issues, as well as one or more books that people can cite as a way of making this real and true history of measurement, properly speaking, credible and accepted in the mainstream. This web site http://ssrn.com/abstract=1698919 is a draft article of my own in this vein that I offer for critique; other material is available on request. Anyone who works on this paper with me and makes a substantial contribution to its publication will be added as co-author.

Second, we can take up the “In order that…” answer to the question “Why do you do things the way you do?” From this point of view, we need to broaden the scope of the measurement research agenda beyond data analysis, estimation, models, and fit assessment in three ways:

  1. by emphasizing predictive construct theories that exhibit the fullest possible understanding of what is measured and so enable the routine reproduction of desired proportionate effects efficiently, with no need to analyze data to obtain an estimate;
  2. by defining the standard units to which all calibrated instruments measuring given constructs are traceable; and
  3. by disseminating to front line users on mass scales instruments measuring in publicly available standard units and giving immediate feedback at the point of use.

These two sets of issues define a series of talking points that together constitute a new narrative for measurement in education, psychology, health care, and many other fields. We and others may see our way to organizing new professional societies, new journals, new university-based programs of study, etc. around these principles.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

A Second Simple Example of Measurement’s Role in Reducing Transaction Costs, Enhancing Market Efficiency, and Enables the Pricing of Intangible Assets

March 9, 2011

The prior post here showed why we should not confuse counts of things with measures of amounts, though counts are the natural starting place to begin constructing measures. That first simple example focused on an analogy between counting oranges and measuring the weight of oranges, versus counting correct answers on tests and measuring amounts of ability. This second example extends the first by, in effect, showing what happens when we want to aggregate value not just across different counts of some one thing but across different counts of different things. The point will be, in effect, to show how the relative values of apples, oranges, grapes, and bananas can be put into a common frame of reference and compared in a practical and convenient way.

For instance, you may go into a grocery store to buy raspberries and blackberries, and I go in to buy cantaloupe and watermelon. Your cost per individual fruit will be very low, and mine will be very high, but neither of us will find this annoying, confusing, or inconvenient because your fruits are very small, and mine, very large. Conversely, your cost per kilogram will be much higher than mine, but this won’t cause either of us any distress because we both recognize the differences in the labor, handling, nutritional, and culinary value of our purchases.

But what happens when we try to purchase something as complex as a unit of socioeconomic development? The eight UN Millennium Development Goals (MDGs) represent a start at a systematic effort to bring human, social, and natural capital together into the same economic and accountability framework as liquid and manufactured capital, and property. But that effort is stymied by the inefficiency and cost of making and using measures of the goals achieved. The existing MDG databases (http://data.un.org/Browse.aspx?d=MDG), and summary reports present overwhelming numbers of numbers. Individual indicators are presented for each year, each country, each region, and each program, goal by goal, target by target, indicator by indicator, and series by series, in an indigestible volume of data.

Though there are no doubt complex mathematical methods by which a philanthropic, governmental, or NGO investor might determine how much development is gained per million dollars invested, the cost of obtaining impact measures is so high that most funding decisions are made with little information concerning expected returns (Goldberg, 2009). Further, the percentages of various needs met by leading social enterprises typically range from 0.07% to 3.30%, and needs are growing, not diminishing. Progress at current rates means that it would take thousands of years to solve today’s problems of human suffering, social disparity, and environmental quality. The inefficiency of human, social, and natural capital markets is so overwhelming that there is little hope for significant improvements without the introduction of fundamental infrastructural supports, such as an Intangible Assets Metric System.

A basic question that needs to be asked of the MDG system is, how can anyone make any sense out of so much data? Most of the indicators are evaluated in terms of counts of the number of times something happens, the number of people affected, or the number of things observed to be present. These counts are usually then divided by the maximum possible (the count of the total population) and are expressed as percentages or rates.

As previously explained in various posts in this blog, counts and percentages are not measures in any meaningful sense. They are notoriously difficult to interpret, since the quantitative meaning of any given unit difference varies depending on the size of what is counted, or where the percentage falls in the 0-100 continuum. And because counts and percentages are interpreted one at a time, it is very difficult to know if and when any number included in the sheer mass of data is reasonable, all else considered, or if it is inconsistent with other available facts.

A study of the MDG data must focus on these three potential areas of data quality improvement: consistency evaluation, volume reduction, and interpretability. Each builds on the others. With consistent data lending themselves to summarization in sufficient statistics, data volume can be drastically reduced with no loss of information (Andersen, 1977, 1999; Wright, 1977, 1997), data quality can be readily assessed in terms of sufficiency violations (Smith, 2000; Smith & Plackner, 2009), and quantitative measures can be made interpretable in terms of a calibrated ruler’s repeatedly reproducible hierarchy of indicators (Bond & Fox, 2007; Masters, Lokan, & Doig, 1994).

The primary data quality criteria are qualitative relevance and meaningfulness, on the one hand, and mathematical rigor, on the other. The point here is one of following through on the maxim that we manage what we measure, with the goal of measuring in such a way that management is better focused on the program mission and not distracted by accounting irrelevancies.

Method

As written and deployed, each of the MDG indicators has the face and content validity of providing information on each respective substantive area of interest. But, as has been the focus of repeated emphases in this blog, counting something is not the same thing as measuring it.

Counts or rates of literacy or unemployment are not, in and of themselves, measures of development. Their capacity to serve as contributing indications of developmental progress is an empirical question that must be evaluated experimentally against the observable evidence. The measurement of progress toward an overarching developmental goal requires inferences made from a conceptual order of magnitude above and beyond that provided in the individual indicators. The calibration of an instrument for assessing progress toward the realization of the Millennium Development Goals requires, first, a reorganization of the existing data, and then an analysis that tests explicitly the relevant hypotheses as to the potential for quantification, before inferences supporting the comparison of measures can be scientifically supported.

A subset of the MDG data was selected from the MDG database available at http://data.un.org/Browse.aspx?d=MDG, recoded, and analyzed using Winsteps (Linacre, 2011). At least one indicator was selected from each of the eight goals, with 22 in total. All available data from these 22 indicators were recorded for each of 64 countries.

The reorganization of the data is nothing but a way of making the interpretation of the percentages explicit. The meaning of any one country’s percentage or rate of youth unemployment, cell phone users, or literacy has to be kept in context relative to expectations formed from other countries’ experiences. It would be nonsense to interpret any single indicator as good or bad in isolation. Sometimes 30% represents an excellent state of affairs, other times, a terrible one.

Therefore, the distributions of each indicator’s percentages across the 64 countries were divided into ranges and converted to ratings. A lower rating uniformly indicates a status further away from the goal than a higher rating. The ratings were devised by dividing the frequency distribution of each indicator roughly into thirds.

For instance, the youth unemployment rate was found to vary such that the countries furthest from the desired goal had rates of 25% and more(rated 1), and those closest to or exceeding the goal had rates of 0-10% (rated 3), leaving the middle range (10-25%) rated 2. In contrast, percentages of the population that are undernourished were rated 1 for 35% or more, 2 for 15-35%, and 3 for less than 15%.

Thirds of the distributions were decided upon only on the basis of the investigator’s prior experience with data of this kind. A more thorough approach to the data would begin from a finer-grained rating system, like that structuring the MDG table at http://mdgs.un.org/unsd/mdg/Resources/Static/Products/Progress2008/MDG_Report_2008_Progress_Chart_En.pdf. This greater detail would be sought in order to determine empirically just how many distinctions each indicator can support and contribute to the overall measurement system.

Sixty-four of the available 336 data points were selected for their representativeness, with no duplications of values and with a proportionate distribution along the entire continuum of observed values.

Data from the same 64 countries and the same years were then sought for the subsequent indicators. It turned out that the years in which data were available varied across data sets. Data within one or two years of the target year were sometimes substituted for missing data.

The data were analyzed twice, first with each indicator allowed its own rating scale, parameterizing each of the category difficulties separately for each item, and then with the full rating scale model, as the results of the first analysis showed all indicators shared strong consistency in the rating structure.

Results

Data were 65.2% complete. Countries were assessed on an average of 14.3 of the 22 indicators, and each indicator was applied on average to 41.7 of the 64 country cases. Measurement reliability was .89-.90, depending on how measurement error is estimated. Cronbach’s alpha for the by-country scores was .94. Calibration reliability was .93-.95. The rating scale worked well (see Linacre, 2002, for criteria). The data fit the measurement model reasonably well, with satisfactory data consistency, meaning that the hypothesis of a measurable developmental construct was not falsified.

The main result for our purposes here concerns how satisfactory data consistency makes it possible to dramatically reduce data volume and improve data interpretability. The figure below illustrates how. What does it mean for data volume to be drastically reduced with no loss of information? Let’s see exactly how much the data volume is reduced for the ten item data subset shown in the figure below.

The horizontal continuum from -100 to 1300 in the figure is the metric, the ruler or yardstick. The number of countries at various locations along that ruler is shown across the bottom of the figure. The mean (M), first standard deviation (S), and second standard deviation (T) are shown beneath the numbers of countries. There are ten countries with a measure of just below 400, just to the left of the mean (M).

The MDG indicators are listed on the right of the figure, with the indicator most often found being achieved relative to the goals at the bottom, and the indicator least often being achieved at the top. The ratings in the middle of the figure increase from 1 to 3 left to right as the probability of goal achievement increases as the measures go from low to high. The position of the ratings in the middle of the figure shifts from left to right as one reads up the list of indicators because the difficulty of achieving the goals is increasing.

Because the ratings of the 64 countries relative to these ten goals are internally consistent, nothing but the developmental level of the country and the developmental challenge of the indicator affects the probability that a given rating will be attained. It is this relation that defines fit to a measurement model, the sufficiency of the summed ratings, and the interpretability of the scores. Given sufficient fit and consistency, any country’s measure implies a given rating on each of the ten indicators.

For instance, imagine a vertical line drawn through the figure at a measure of 500, just above the mean (M). This measure is interpreted relative to the places at which the vertical line crosses the ratings in each row associated with each of the ten items. A measure of 500 is read as implying, within a given range of error, uncertainty, or confidence, a rating of

  • 3 on debt service and female-to-male parity in literacy,
  • 2 or 3 on how much of the population is undernourished and how many children under five years of age are moderately or severely underweight,
  • 2 on infant mortality, the percent of the population aged 15 to 49 with HIV, and the youth unemployment rate,
  • 1 or 2 the poor’s share of the national income, and
  • 1 on CO2 emissions and the rate of personal computers per 100 inhabitants.

For any one country with a measure of 500 on this scale, ten percentages or rates that appear completely incommensurable and incomparable are found to contribute consistently to a single valued function, developmental goal achievement. Instead of managing each separate indicator as a universe unto itself, this scale makes it possible to manage development itself at its own level of complexity. This ten-to-one ratio of reduced data volume is more than doubled when the total of 22 items included in the scale is taken into account.

This reduction is conceptually and practically important because it focuses attention on the actual object of management, development. When the individual indicators are the focus of attention, the forest is lost for the trees. Those who disparage the validity of the maxim, you manage what you measure, are often discouraged by the the feeling of being pulled in too many directions at once. But a measure of the HIV infection rate is not in itself a measure of anything but the HIV infection rate. Interpreting it in terms of broader developmental goals requires evidence that it in fact takes a place in that larger context.

And once a connection with that larger context is established, the consistency of individual data points remains a matter of interest. As the world turns, the order of things may change, but, more likely, data entry errors, temporary data blips, and other factors will alter data quality. Such changes cannot be detected outside of the context defined by an explicit interpretive framework that requires consistent observations.

-100  100     300     500     700     900    1100    1300
|-------+-------+-------+-------+-------+-------+-------|  NUM   INDCTR
1                                 1  :    2    :  3     3    9  PcsPer100
1                         1   :   2    :   3            3    8  CO2Emissions
1                    1  :    2    :   3                 3   10  PoorShareNatInc
1                 1  :    2    :  3                     3   19  YouthUnempRatMF
1              1   :    2   :   3                       3    1  %HIV15-49
1            1   :   2    :   3                         3    7  InfantMortality
1          1  :    2    :  3                            3    4  ChildrenUnder5ModSevUndWgt
1         1   :    2    :  3                            3   12  PopUndernourished
1    1   :    2   :   3                                 3    6  F2MParityLit
1   :    2    :  3                                      3    5  DebtServExpInc
|-------+-------+-------+-------+-------+-------+-------|  NUM   INDCTR
-100  100     300     500     700     900    1100    1300
                   1
       1   1 13445403312323 41 221    2   1   1            COUNTRIES
       T      S       M      S       T

Discussion

A key element in the results obtained here concerns the fact that the data were about 35% missing. Whether or not any given indicator was actually rated for any given country, the measure can still be interpreted as implying the expected rating. This capacity to take missing data into account can be taken advantage of systematically by calibrating a large bank of indicators. With this in hand, it becomes possible to gather only the amount of data needed to make a specific determination, or to adaptively administer the indicators so as to obtain the lowest-error (most reliable) measure at the lowest cost (with the fewest indicators administered). Perhaps most importantly, different collections of indicators can then be equated to measure in the same unit, so that impacts may be compared more efficiently.

Instead of an international developmental aid market that is so inefficient as to preclude any expectation of measured returns on investment, setting up a calibrated bank of indicators to which all measures are traceable opens up numerous desirable possibilities. The cost of assessing and interpreting the data informing aid transactions could be reduced to negligible amounts, and the management of the processes and outcomes in which that aid is invested would be made much more efficient by reduced data volume and enhanced information content. Because capital would flow more efficiently to where supply is meeting demand, nonproducers would be cut out of the market, and the effectiveness of the aid provided would be multiplied many times over.

The capacity to harmonize counts of different but related events into a single measurement system presents the possibility that there may be a bright future for outcomes-based budgeting in education, health care, human resource management, environmental management, housing, corrections, social services, philanthropy, and international development. It may seem wildly unrealistic to imagine such a thing, but the return on the investment would be so monumental that not checking it out would be even crazier.

A full report on the MDG data, with the other references cited, is available on my SSRN page at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1739386.

Goldberg, S. H. (2009). Billions of drops in millions of buckets: Why philanthropy doesn’t advance social progress. New York: Wiley.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

A Technology Road Map for Efficient Intangible Assets Markets

February 24, 2011

Scientific technologies, instruments and conceptual images have been found to play vitally important roles in economic success because of the way they enable accurate predictions of future industry and market states (Miller & O’Leary, 2007). The technology road map for the microprocessor industry, based in Moore’s Law, has successfully guided market expectations and coordinated research investment decisions for over 40 years. When the earlier electromechanical, relay, vacuum tube, and transistor computing technology paradigms are included, the same trajectory has dominated the computer industry for over 100 years (Kurzweil, 2005, pp. 66-67).

We need a similar technology road map to guide the creation and development of intangible asset markets for human, social, and natural (HSN) capital. This will involve intensive research on what the primary constructs are, determining what is measurable and what is not, creating consensus standards for uniform metrics and the metrology networks through which those standards will function. Alignments with these developments will require comprehensively integrated economic models, accounting frameworks, and investment platforms, in addition to specific applications deploying the capital formations.

What I’m proposing is, in a sense, just an extension in a new direction of the metrology challenges and issues summarized in Table ITWG15 on page 48 in the 2010 update to the International Technology Roadmap for Semiconductors (http://www.itrs.net/about.html). Distributed electronic communication facilitated by computers and the Internet is well on the way to creating a globally uniform instantaneous information network. But much of what needs to be communicated through this network remains expressed in locally defined languages that lack common points of reference. Meaningful connectivity demands a shared language.

To those who say we already have the technology necessary and sufficient to the measurement and management of human, social, and natural capital, I say think again. The difference between what we have and what we need is the same as the difference between (a) an economy whose capital resources are not represented in transferable representations like titles and deeds, and that are denominated in a flood of money circulating in different currencies, and, (b) an economy whose capital resources are represented in transferable documents and are traded using a single currency with a restricted money supply. The measurement of intangible assets is today akin to the former economy, with little actual living capital and hundreds of incommensurable instruments and scoring systems, when what we need is the latter. (See previous entries in this blog for more on the difference between dead and living capital.)

Given the model of a road map detailing the significant features of the living capital terrain, industry-specific variations will inform the development of explicit market expectations, the alignment of HSN capital budgeting decisions, and the coordination of research investments. The concept of a technology road map for HSN capital is based in and expands on an integration of hierarchical complexity (Commons & Richards, 2002; Dawson, 2004), complex adaptive functionality (Taylor, 2003), Peirce’s semiotic developmental map of creative thought (Wright, 1999), and historical stages in the development of measuring systems (Stenner & Horabin, 1992; Stenner, Burdick, Sanford, & Burdick, 2006).

Technology road maps replace organizational amnesia with organizational learning by providing the structure of a memory that not only stores information, knowledge, understanding, and wisdom, but makes it available for use in new situations. Othman and Hashim (2004) describe organizational amnesia (OA) relative to organizational learning (OL) in a way that opens the door to a rich application of Miller and O’Leary’s (2007) detailed account of how technology road maps contribute to the creation of new markets and industries. Technology road maps function as the higher organizational principles needed for transforming individual and social expertise into economically useful products and services. Organizational learning and adaptability further need to be framed at the inter-organizational level where their various dimensions or facets are aligned not only within individual organizations but between them within the industry as a whole.

The mediation of the individual and organizational levels, and of the organizational and inter-organizational levels, is facilitated by measurement. In the microprocessor industry, Moore’s Law enabled the creation of technology road maps charting the structure, processes, and outcomes that had to be aligned at the individual, organizational, and inter-organizational levels to coordinate the entire microprocessor industry’s economic success. Such road maps need to be created for each major form of human, social, and natural capital, with the associated alignments and coordinations put in play at all levels of every firm, industry, and government.

It is a basic fact of contemporary life that the technologies we employ every day are so complex that hardly anyone understands how they do what they do. Technological miracles are commonplace events, from transportation to entertainment, from health care to manufacturing. And we usually suffer little in the way of adverse consequences from not knowing how an automatic transmission, a thermometer, or digital video reproduction works. It is enough to know how to use the tool.

This passive acceptance of technical details beyond our ken extends into areas in which standards, methods, and products are much less well defined. Managers, executives, researchers, teachers, clinicians, and others who need measurement but who are unaware of its technicalities are then put in the position of being passive consumers accepting the lowest common denominator in the quality of the services and products obtained.

And that’s not all. Just as the mass market of measurement consumers is typically passive and uninformed, in complementary fashion the supply side is fragmented and contentious. There is little agreement among measurement experts as to which quantitative methods set the standard as the state of the art. Virtually any method can be justified in terms of some body of research and practice, so the confused consumer accepts whatever is easily available or is most likely to support a preconceived agenda.

It may be possible, however, to separate the measurement wheat from the chaff. For instance, measurement consumers may value a way of distinguishing among methods that is based in a simple criterion of meaningful utility. What if all measurement consumers’ own interests in, and reasons for, measuring something in particular, such as literacy or community, were emphasized and embodied in a common framework? What if a path of small steps from currently popular methods of less value to more scientific ones of more value could be mapped? Such a continuum of methods could range from those doing the least to advance the users’ business interests to those doing the most to advance those interests.

The aesthetics, simplicity, meaningfulness, rigor, and practical consequences of strong theoretical requirements for instrument calibration provide such criteria for choices as to models and methods (Andrich, 2002, 2004; Busemeyer and Wang, 2000; Myung, 2000; Pitt, Kim, Myung, 2003; Wright, 1997, 1999). These criteria could be used to develop and guide explicit considerations of data quality, construct theory, instrument calibration, quantitative comparisons, measurement standard metrics, etc. along a continuum from the most passive and least objective to the most actively involved and most objective.

The passive approach to measurement typically starts from and prioritizes content validity. The questions asked on tests, surveys, and assessments are considered relevant primarily on the basis of the words they use and the concepts they appear to address. Evidence that the questions actually cohere together and measure the same thing is not needed. If there is any awareness of the existence of axiomatically prescribed measurement requirements, these are not considered to be essential. That is, if failures of invariance are observed, they usually provoke a turn to less stringent data treatments instead of a push to remove or prevent them. Little or no measurement or construct theory is implemented, meaning that all results remain dependent on local samples of items and people. Passively approaching measurement in this way is then encumbered by the need for repeated data gathering and analysis, and by the local dependency of the results. Researchers working in this mode are akin to the woodcutters who say they are too busy cutting trees to sharpen their saws.

An alternative, active approach to measurement starts from and prioritizes construct validity and the satisfaction of the axiomatic measurement requirements. Failures of invariance provoke further questioning, and there is significant practical use of measurement and construct theory. Results are then independent of local samples, sometimes to the point that researchers and practical applications are not encumbered with usual test- or survey-based data gathering and analysis.

As is often the case, this black and white portrayal tells far from the whole story. There are multiple shades of grey in the contrast between passive and active approaches to measurement. The actual range of implementations is much more diverse that the simple binary contrast would suggest (see the previous post in this blog for a description of a hierarchy of increasingly complex stages in measurement). Spelling out the variation that exists could be helpful for making deliberate, conscious choices and decisions in measurement practice.

It is inevitable that we would start from the materials we have at hand, and that we would then move through a hierarchy of increasing efficiency and predictive control as understanding of any given variable grows. Previous considerations of the problem have offered different categorizations for the transformations characterizing development on this continuum. Stenner and Horabin (1992) distinguish between 1) impressionistic and qualitative, nominal gradations found in the earliest conceptualizations of temperature, 2) local, data-based quantitative measures of temperature, and 3) generalized, universally uniform, theory-based quantitative measures of temperature.

The latter is prized for the way that thermodynamic theory enables the calibration of individual thermometers with no need for testing each one in empirical studies of its performance. Theory makes it possible to know in advance what the results of such tests would be with enough precision to greatly reduce the burden and expenses of instrument calibration.

Reflecting on the history of psychosocial measurement in this context, it then becomes apparent that these three stages can then be further broken down. The previous post in this blog lists the distinguishing features for each of six stages in the evolution of measurement systems, building on the five stages described by Stenner, Burdick, Sanford, and Burdick (2006).

And so what analogue of Moore’s Law might be projected? What kind of timetable can be projected for the unfolding of what might be called Stenner’s Law? Guidance for reasonable expectations is found in Kurzweil’s (2005) charting of historical and projected future exponential increases in the volume of information and computer processing speed. The accelerating growth in knowledge taking place in the world today speaks directly to a systematic integration of criteria for what shall count as meaningful new learning. Maps of the roads we’re traveling will provide some needed guidance and make the trip more enjoyable, efficient, and productive. Perhaps somewhere not far down the road we’ll be able to project doubling rates for growth in the volume of fungible literacy capital globally, or the halving rates in the cost of health capital stocks. We manage what we measure, so when we begin measuring well what we want to manage well, we’ll all be better off.

References

Andrich, D. (2002). Understanding resistance to the data-model relationship in Rasch’s paradigm: A reflection for the next generation. Journal of Applied Measurement, 3(3), 325-59.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Busemeyer, J. R., & Wang, Y.-M. (2000, March). Model comparisons and model selections based on generalization criterion methodology. Journal of Mathematical Psychology, 44(1), 171-189 [http://quantrm2.psy.ohio-state.edu/injae/jmpsp.htm].

Commons, M. L., & Richards, F. A. (2002, Jul). Organizing components into combinations: How stage transition works. Journal of Adult Development, 9(3), 159-177.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

Kurzweil, R. (2005). The singularity is near: When humans transcend biology. New York: Viking Penguin.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-34.

Myung, I. J. (2000). Importance of complexity in model selection. Journal of Mathematical Psychology, 44(1), 190-204.

Othman, R., & Hashim, N. A. (2004). Typologizing organizational amnesia. The Learning Organization, 11(3), 273-84.

Pitt, M. A., Kim, W., & Myung, I. J. (2003). Flexibility versus generalizability in model selection. Psychonomic Bulletin & Review, 10, 29-44.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Stenner, A. J., & Horabin, I. (1992). Three stages of construct definition. Rasch Measurement Transactions, 6(3), 229 [http://www.rasch.org/rmt/rmt63b.htm].

Taylor, M. C. (2003). The moment of complexity: Emerging network culture. Chicago: University of Chicago Press.

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Stages in the Development of Meaningful, Efficient, and Useful Measures

February 21, 2011

In all learning, we use what we already know as a means of identifying what we do not yet know. When someone can read a written language, knows an alphabet and has a vocabulary, understands grammar and syntax, then that knowledge can be used to learn about the world. Then, knowing what birds are, for instance, one might learn about different kinds of birds or the typical behaviors of one bird species.

And so with measurement, we start from where we find ourselves, as with anything else. There is no need or possibility for everyone to master all the technical details of every different area of life that’s important. But it is essential that we know what is technically possible, so that we can seek out and find the tools that help us achieve our goals. We can’t get what we can’t or don’t ask for. In the domain of measurement, it seems that hardly anyone is looking for what’s actually readily available.

So it seems pertinent to offer a description of a continuum of increasingly meaningful, efficient and useful ways of measuring. Previous considerations of the problem have offered different categorizations for the transformations characterizing development on this continuum. Stenner and Horabin (1992) distinguish between 1) impressionistic and qualitative, nominal gradations found in the earliest conceptualizations of temperature, 2) local, data-based quantitative measures of temperature, and 3) generalized, universally uniform, theory-based quantitative measures of temperature.

Theory-based temperature measurement is prized for the way that thermodynamic theory enables the calibration of individual thermometers with no need for testing each one in empirical studies of its performance. As Lewin (1951, p. 169) put it, “There is nothing so practical as a good theory.” Thus we have electromagnetic theory making it possible to know the conduction and resistance characteristics of electrical cable from the properties of the metal alloys and insulators used, with no need to test more than a small fraction of that cable as a quality check.

Theory makes it possible to know in advance what the results of such tests would be with enough precision to greatly reduce the burden and expenses of instrument calibration. There likely would be no electrical industry at all if the properties of every centimeter of cable and every appliance had to be experimentally tested. This principle has been employed in measuring human, social, and natural capital for some time, but, for a variety of reasons, it has not yet been adopted on a wide scale.

Reflecting on the history of psychosocial measurement in this context, it then becomes apparent that Stenner and Horabin’s (1992) three stages can then be further broken down. Listed below are the distinguishing features for each of six stages in the evolution of measurement systems, building on the five stages described by Stenner, Burdick, Sanford, and Burdick (2006). This progression of increasing complexity, meaning, efficiency, and utility can be used as a basis for a technology roadmap that will enable the coordination and alignment of various services and products in the domain of intangible assets, as I will take up in a forthcoming post.

Stage 1. Least meaning, utility, efficiency, and value

Purely passive, receptive

Statistics describe data: What you see is what you get

Content defines measure

Additivity, invariance, etc. not tested, so numbers do not stand for something that adds up like they do

Measurement defined statistically in terms of group-level intervariable relations

Meaning of numbers changes with questions asked and persons answering

No theory

Data must be gathered and analyzed to have results

Commercial applications are instrument-dependent

Standards based in ensuring fair methods and processes

Stage 2

Slightly less passive, receptive but still descriptively oriented

Additivity, invariance, etc. tested, so numbers might stand for something that adds up like they do

Measurement still defined statistically in terms of group-level intervariable relations

Falsification of additive hypothesis effectively derails measurement effort

Descriptive models with interaction effects accepted as viable alternatives

Typically little or no attention to theory of item hierarchy and construct definition

Empirical (data-based) calibrations only

Data must be gathered and analyzed to have results

Initial awareness of measurement theory

Commercial applications are instrument-dependent

Standards based in ensuring fair methods and processes

Stage 3

Even less purely passive & receptive, more active

Instrument still designed relative to content specifications

Additivity, invariance, etc. tested, so numbers might stand for something that adds up like they do

Falsification of additive hypothesis provokes questions as to why

Descriptive models with interaction effects not accepted as viable alternatives

Measurement defined prescriptively in terms of individual-level intravariable invariance

Significant attention to theory of item hierarchy and construct definition

Empirical calibrations only

Data has to be gathered and analyzed to have results

More significant use of measurement theory in prescribing acceptable data quality

Limited construct theory (no predictive power)

Commercial applications are instrument-dependent

Standards based in ensuring fair methods and processes

Stage 4

First stage that is more active than passive

Initial efforts to (re-)design instrument relative to construct specifications and theory

Additivity, invariance, etc. tested in thoroughly prescriptive focus on calibrating instrument

Numbers not accepted unless they stand for something that adds up like they do

Falsification of additive hypothesis provokes questions as to why and corrective action

Models with interaction effects not accepted as viable alternatives

Measurement defined prescriptively in terms of individual-level intravariable invariance

Significant attention to theory of item hierarchy and construct definition relative to instrument design

Empirical calibrations only but model prescribes data quality

Data usually has to be gathered and analyzed to have results

Point of use self-scoring forms might provide immediate measurement results to end user

Some construct theory (limited predictive power)

Some commercial applications are not instrument-dependent (as in CAT item bank implementations)

Standards based in ensuring fair methods and processes

Stage 5

Significantly active approach to measurement

Item hierarchy translated into construct theory

Construct specification equation predicts item difficulties

Theory-predicted (not empirical) calibrations used in applications

Item banks superseded by single-use items created on the fly

Calibrations checked against empirical results but data gathering and analysis not necessary

Point of use self-scoring forms or computer apps provide immediate measurement results to end user

Used routinely in commercial applications

Awareness that standards might be based in metrological traceability to consensus standard uniform metric

Stage 6. Most meaning, utility, efficiency, and value

Most purely active approach to measurement

Item hierarchy translated into construct theory

Construct specification equation predicts item ensemble difficulties

Theory-predicted calibrations enable single-use items created from context

Checked against empirical results for quality assessment but data gathering and analysis not necessary

Point of use self-scoring forms or computer apps provide immediate measurement results to end user

Used routinely in commercial applications

Standards based in metrological traceability to consensus standard uniform metric

 

References

Lewin, K. (1951). Field theory in social science: Selected theoretical papers (D. Cartwright, Ed.). New York: Harper & Row.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Stenner, A. J., & Horabin, I. (1992). Three stages of construct definition. Rasch Measurement Transactions, 6(3), 229 [http://www.rasch.org/rmt/rmt63b.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Twelve principles I’m taking away from recent discussions

January 27, 2011
  1. Hypotheses non fingo A: Ideas about things are not hypothesized and tested against those things so much as things are determined to be what they are by testing them against ideas. Facts are recognizable as such only because they relate with a prior idea.
  2. Hypotheses non fingo B: Cohen’s introduction to Newton’s Opticks makes it plain that Newton is not offering a general methodological pointer in this phrase. Rather, he is answering critics who wanted him to explain what gravity is, and what it’s causes are. In saying, I feign no hypotheses, Newton is merely indicating that he’s not going to make up stories about something he knows nothing about. And in contrast with the Principia, the Opticks provides a much more accessible overview of the investigative process, from the initial engagement with light, where indeed no hypotheses as to its causes are offered, and onto more specific inquiries into its properties, where hypotheses necessarily inform experimental contrasts.
  3. Ideas, such as mathematical/geometrical theorems, natural laws, or the structure of Rasch models, do not exist and are unobservable. No triangle ever fits the Pythagorean theorem, there are no bodies left to themselves or balls rolling on frictionless planes, and there are no test, survey, or assessment results completely unaffected by the particular questions asked and persons answering.
  4. The clarity and transparency of an idea requires careful attention to the unity and sameness of the relevant class of things observed. So far as possible, the observational framework must be constrained by theory to produce observations likely to conform reasonably with the idea.
  5. New ideas come into language when a phenomenon or effect, often technically produced, exhibits persistent and stable properties across samples, observers, instruments, etc.
  6. New word-things that come into language, whether a galaxy, an element in the periodic table, a germ, or a psychosocial construct, may well have existed since the dawn of time and may well have exerted tangible effects on humans for millennia. They did not, however, do so for anyone in terms of the newly-available theory and understanding, which takes a place in a previously unoccupied position within the matrix of interrelated ideas, facts, and social networks.
  7. Number does not delimit the pure ideal concept of amount, but vice versa.
  8. Rasch models are one way of specifying the ideal form observations must approximate if they are to exhibit magnitude amounts divisible into ratios. Fitting data to such a model in the absence of a theory of the construct is only a very early step in the process of devising a measurement system.
  9. The invariant representation of a construct across samples, instruments, observers, etc. exhibiting magnitude amounts divisible into ratios provides the opportunity for allowing a pure ideal concept of amount to delimit number.
  10. Being suspended in language does not imply a denial of concrete reality and the separate independent existence of things. Rather, if those things did not exist, there would be no impetus for anything to come into words, and no criteria for meaningfulness.
  11. Situating objectivity in a sphere of signs removes the need for a separate sphere of facts constituted outside of language. Insofar as an ideal abstraction approximates convergence with and separation from different ways of expressing its meaning, an objective status owing nothing to a sphere of facts existing outside of language is obtained.
  12. The technology of a signifying medium (involving an alphabet, words as names for features of the environment, other symbols, syntactical and semantic rules, tools and instruments, etc.) gives rise to observations (data) that may exhibit regular patterns and that may come to be understood well enough to be reproduced at will via theory. Each facet (instrument, data, theory) mediates the relation of the other two.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Consequences of Standardized Technical Effects for Scientific Advancement

January 24, 2011

Note. This is modified from:

Fisher, W. P., Jr. (2004, Wednesday, January 21). Consequences of standardized technical effects for scientific advancement. In  A. Leplège (Chair), Session 2.5A. Rasch Models: History and Philosophy. Second International Conference on Measurement in Health, Education, Psychology, and Marketing: Developments with Rasch Models, The International Laboratory for Measurement in the Social Sciences, School of Education, Murdoch University, Perth, Western Australia.

—————————

Over the last several decades, historians of science have repeatedly produced evidence contradicting the widespread assumption that technology is a product of experimentation and/or theory (Kuhn 1961; Latour 1987; Rabkin 1992; Schaffer 1992; Hankins & Silverman 1999; Baird 2002). Theory and experiment typically advance only within the constraints set by a key technology that is widely available to end users in applied and/or research contexts. Thus, “it is not just a clever historical aphorism, but a general truth, that ‘thermodynamics owes much more to the steam engine than ever the steam engine owed to thermodynamics’” (Price 1986, p. 240).

The prior existence of the relevant technology comes to bear on theory and experiment again in the common, but mistaken, assumption that measures are made and experimentally compared in order to discover scientific laws. History and the logic of measurement show that measures are rarely made until the relevant law is effectively embodied in an instrument (Kuhn 1961; Michell 1999). This points to the difficulty experienced in metrologically fusing (Schaffer 1992, p. 27; Lapré & van Wassenhove 2002) instrumentalists’ often inarticulate, but materially effective, knowledge (know-how) with theoreticians’ often immaterial, but well articulated, knowledge (know-why) (Galison 1999; Baird 2002).

Because technology often dictates what, if any, phenomena can be consistently produced, it constrains experimentation and theorizing by focusing attention selectively on reproducible, potentially interpretable effects, even when those effects are not well understood (Ackermann 1985; Daston & Galison 1992; Ihde 1998; Hankins & Silverman 1999; Maasen & Weingart 2001). Criteria for theory choice in this context stem from competing explanatory frameworks’ experimental capacities to facilitate instrument improvements, prediction of experimental results, and gains in the efficiency with which a phenomenon is produced.

In this context, the relatively recent introduction of measurement models requiring additive, invariant parameterizations (Rasch 1960) provokes speculation as to the effect on the human sciences that might be wrought by the widespread availability of consistently reproducible effects expressed in common quantitative languages. Paraphrasing Price’s comment on steam engines and thermodynamics, might it one day be said that as yet unforeseeable advances in reading theory will owe far more to the Lexile analyzer (Burdick & Stenner 1996) than ever the Lexile analyzer owed reading theory?

Kuhn (1961) speculated that the second scientific revolution of the mid-nineteenth century followed in large part from the full mathematization of physics, i.e., the emergence of metrology as a professional discipline focused on providing universally accessible uniform units of measurement (Roche 1998). Might a similar revolution and new advances in the human sciences follow from the introduction of rigorously mathematical uniform measures?

Measurement technologies capable of supporting the calibration of additive units that remain invariant over instruments and samples (Rasch 1960) have been introduced relatively recently in the human sciences. The invariances produced appear 1) very similar to those produced in the natural sciences (Fisher 1997) and 2) based in the same mathematical metaphysics as that informing the natural sciences (Fisher 2003). Might then it be possible that the human sciences are on the cusp of a revolution analogous to that of nineteenth century physics? Other factors involved in answering this question, such as the professional status of the field, the enculturation of students, and the scale of the relevant enterprises, define the structure of circumstances that might be capable of supporting the kind of theoretical consensus and research productivity that came to characterize, for instance, work in electrical resistance through the early 1880s (Schaffer 1992).

Much could be learned from Rasch’s use of Maxwell’s method of analogy (Nersessian, 2002; Turner, 1955), not just in the modeling of scientific laws but from the social and economic factors that made the regularities of natural phenomena function as scientific capital (Latour, 1987). Quantification must be understood in the fully mathematical sense of commanding a comprehensive grasp of the real root of mathematical thinking. Far from being simply a means of producing numbers, to be useful, quantification has to result in qualitatively transparent figure-meaning relations at any point of use for any one of every different kind of user. Connections between numbers and unit amounts of the variable must remain constant across samples, instruments, time, space, and measurers. Quantification that does not support invariant linear comparisons expressed in a uniform metric available universally to all end users at the point of need is inadequate and incomplete. Such standardization is widely respected in the natural sciences but is virtually unknown in the human sciences, largely due to untested hypotheses and unexamined prejudices concerning the viability of universal uniform measures for the variables measured via tests, surveys, and performance assessments.

Quantity is an effective medium for science to the extent that it comprises an instance of the kind of common language necessary for distributed, collective thinking; for widespread agreement on what makes research results compelling; and for the formation of social capital’s group-level effects. It may be that the primary relevant difference between the case of 19th century physics and today’s human sciences concerns the awareness, widespread among scientists in the 1800s and virtually nonexistent in today’s human sciences, that universal uniform metrics for the variables of interest are both feasible and of great human, scientific, and economic value.

In the creative dynamics of scientific instrument making, as in the making of art, the combination of inspiration and perspiration can sometimes result in cultural gifts of the first order. It nonetheless often happens that some of these superlative gifts, no matter how well executed, are unable to negotiate the conflict between commodity and gift economics characteristic of the marketplace (Baird, 1997; Hagstrom, 1965; Hyde, 1979), and so remain unknown, lost to the audiences they deserve, and unable to render their potential effects historically. Value is not an intrinsic characteristic of the gift; rather, value is ascribed as a function of interests. If interests are not cultivated via the clear definition of positive opportunities for self-advancement, common languages, socio-economic relations, and recruitment, gifts of even the greatest potential value may die with their creators. On the other hand, who has not seen mediocrity disproportionately rewarded merely as a result of intensive marketing?

A central problem is then how to strike a balance between individual or group interests and the public good. Society and individuals are interdependent in that children are enculturated into the specific forms of linguistic and behavioral competence that are valued in communities at the same time that those communities are created, maintained, and reproduced through communicative actions (Habermas, 1995, pp. 199-200). The identities of individuals and societies then co-evolve, as each defines itself through the other via the medium of language. Language is understood broadly in this context to include all perceptual reading of the environment, bodily gestures, social action, etc., as well as the use of spoken or written symbols and signs (Harman, 2005; Heelan, 1983; Ihde, 1998; Nicholson, 1984; Ricoeur, 1981).

Technologies extend language by providing media for the inscription of new kinds of signs (Heelan, 1983a, 1998; Ihde, 1991, 1998; Ihde & Selinger, 2003). Thus, mobility desires and practices are inscribed and projected into the world using the automobile; shelter and life style, via housing and clothing; and communications, via alphabets, scripts, phonemes, pens and paper, telephones, and computers. Similarly, technologies in the form of test, survey, and assessment instruments provide the devices on which we inscribe desires for social mobility, career advancement, health maintenance and improvement, etc.

References

Ackermann, J. R. (1985). Data, instruments, and theory: A dialectical approach to understanding science. Princeton, New Jersey: Princeton University Press.

Baird, D. (1997, Spring-Summer). Scientific instrument making, epistemology, and the conflict between gift and commodity economics. Techné: Journal of the Society for Philosophy and Technology, 2(3-4), 25-46. Retrieved 08/28/2009, from http://scholar.lib.vt.edu/ejournals/SPT/v2n3n4/baird.html.

Baird, D. (2002, Winter). Thing knowledge – function and truth. Techné: Journal of the Society for Philosophy and Technology, 6(2). Retrieved 19/08/2003, from http://scholar.lib.vt.edu/ejournals/SPT/v6n2/baird.html.

Burdick, H., & Stenner, A. J. (1996). Theoretical prediction of test items. Rasch Measurement Transactions, 10(1), 475 [http://www.rasch.org/rmt/rmt101b.htm].

Daston, L., & Galison, P. (1992, Fall). The image of objectivity. Representations, 40, 81-128.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York, New York: Routledge.

Habermas, J. (1995). Moral consciousness and communicative action. Cambridge, Massachusetts: MIT Press.

Hagstrom, W. O. (1965). Gift-giving as an organizing principle in science. The Scientific Community. New York: Basic Books, pp. 12-22. (Rpt. in B. Barnes, (Ed.). (1972). Sociology of science: Selected readings (pp. 105-20). Baltimore, Maryland: Penguin Books.

Hankins, T. L., & Silverman, R. J. (1999). Instruments and the imagination. Princeton, New Jersey: Princeton University Press.

Harman, G. (2005). Guerrilla metaphysics: Phenomenology and the carpentry of things. Chicago: Open Court.

Hyde, L. (1979). The gift: Imagination and the erotic life of property. New York: Vintage Books.

Ihde, D. (1998). Expanding hermeneutics: Visualism in science. Northwestern University Studies in Phenomenology and Existential Philosophy). Evanston, Illinois: Northwestern University Press.

Kuhn, T. S. (1961). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. in The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago, Illinois: University of Chicago Press (Original work published 1977).

Lapré, M. A., & Van Wassenhove, L. N. (2002, October). Learning across lines: The secret to more efficient factories. Harvard Business Review, 80(10), 107-11.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York, New York: Cambridge University Press.

Maasen, S., & Weingart, P. (2001). Metaphors and the dynamics of knowledge. (Vol. 26. Routledge Studies in Social and Political Thought). London: Routledge.

Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept. Cambridge: Cambridge University Press.

Nersessian, N. J. (2002). Maxwell and “the Method of Physical Analogy”: Model-based reasoning, generic abstraction, and conceptual change. In D. Malament (Ed.), Essays in the history and philosophy of science and mathematics (pp. 129-166). Lasalle, Illinois: Open Court.

Price, D. J. d. S. (1986). Of sealing wax and string. In Little Science, Big Science–and Beyond (pp. 237-253). New York, New York: Columbia University Press. p. 240:

Rabkin, Y. M. (1992). Rediscovering the instrument: Research, industry, and education. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 57-82). Bellingham, Washington: SPIE Optical Engineering Press.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roche, J. (1998). The mathematics of measurement: A critical history. London: The Athlone Press.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Turner, J. (1955, November). Maxwell on the method of physical analogy. British Journal for the Philosophy of Science, 6, 226-238.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

The Birds and the Bees of Living Meaning

November 22, 2010

or

How the New Renaissance Will be Conceived in and Midwifed from the Womb of Nature

Sex, Reproduction, and the Consumer Culture

Human sexuality is, of course, more than the sum of its biological parts. Many parents joke that human reproduction would halt and the species would go extinct were it not for the intense pleasure of sexual experience. Many social critics, for their part, have turned a jaded eye on the rampant use of sexual imagery in the consumer culture. The association of sexual prowess with anything from toothpaste to automobiles plays up an empty metaphor of immediate gratification that connotes shortchanged consumers, unfairly boosted profits, and no redeeming long term value.

We would, of course, be mistaken to make too much of a connection between the parents’ joke and the critics’ social commentary. A bit of humor can help release tension when the work of child rearing and homemaking becomes stressful, and it is unlikely that trade would come to a halt if hot dates were banned from TV commercials. Commerce, in the broad sense of the term, is an end in itself.

But perhaps there is more of a connection than is evident at first blush. Advertising is an extremely compressed form of communication. It competes with many other stimuli for fleeting seconds of attention and so has to get its message across quickly. What better, simpler, more genetically programmed message could there be than the promise of attracting a desirable mate?

This hint is the tip of the tip of an iceberg. The larger question is one that asks how the role of desire and its satisfaction in the procreation of the species might serve as a model for economic activity. Might sexual satisfaction and the resulting reproductive success be taken as a natural model for profit and the resulting economic success?

Though this model has been assumed or described to various extents in the domains of ecological, behavioral, and heterodox economics, what we might call its molecular genetics have not yet been described. At this level, the model functions as a positive-sum game, and not as the zero-sum game so often assumed in economics. Properly conceived and experienced, neither sexuality nor profit give one-sided results, with someone necessarily winning and someone else necessarily losing. Rather, in the optimal circumstances we presumably want to foster, both parties to the exchanges must get what they want and contribute to the overall product of the exchange.

In this scenario, profit has to be further defined as not mere gratification and conquest, but as long term reproductive viability and sustainability. The intensity of sexual desire and satisfaction would likely not have evolved without stakes as high as the continuity of the species. And, indeed, researchers are finding strong positive relationships between firms’ long term profitability and their relations with labor, their communities, and the natural environment. Broadly conceived, for commerce to continue, social intercourse can and ultimately must result in viable offspring situated in a supportive environment.

Living vs Dead Capital

All of this suggests that we might be onto something. But for the metaphor to work, we need to take it further. We find what we need in the language of ecological economics and natural capital, and in the distinction between economically alive and economically dead capital.

The ancient root metaphor hidden in the word “capital” derives from the Latin capitus, head. Some might locate scientific or intellectual capital in a calculating center, like the brain, but others might bring out a sense of capital as part of the natural order. The concept of capital likely emerged in early agricultural economies from a focus on head of livestock: cattle, sheep, horses, etc. We might also conjecture about an even earlier prehistorical sense of capital as naturally embodied in the herds of antelope, deer, elk, or bison that migratory hunters pursued. In both cases, given grazing and water resources supplied by nature, herds replenished themselves with the passing of the seasons, giving birth to new life of their own accord.

There is a sense then in which plant and animal life profits enough from naturally available resources to sustain itself. Though the occurrence of population booms and busts still parallels economic cycles, hunters, fishers, and farmers can be imagined as profiting from managing naturally self-restoring resources within the constraints of a sustainable ecology.

Living capital and the sustenance of ongoing ecologically sound profitability are not restricted, however, to forms of capital stock that walk, crawl, swim, or fly. De Soto (2000) makes a distinction between dead and living capital that explains why capitalism thrives in some countries, but has not yet in others. De Soto points out that the difference between successful and failing capitalist countries lies in the status of what he calls transferable representations within networks of legal and financial institutions. Transferable representations are nothing but the legally recognized and financially fungible titles and deeds that make it possible for the wealth locked up in land, buildings, and equipment to be made exchangeable for other forms of wealth. Titles, deeds, and the infrastructure they function within are, then, what comprise the difference between dead and living capital.

In North America, Europe, Australia, and Japan, property can be divided into shares and sold, or accumulated across properties into an expression of total wealth and leveraged as collateral for further investment, all with no need to modify the property itself in any way. De Soto’s point is that this is often not so in the Third World and former communist countries, where it commonly takes more than 10 years of full time work to obtain legal title, and then similar degrees of effort to maintain it. The process requires so much labor that few have the endurance or resources to complete it. They then must deny themselves the benefits of having an address, and cannot receive mail, electrical service, or take out a mortgage. The economy is then encumbered by the dead weight of the inefficiencies and frictions of frozen capital markets.

In the same way that the mass migration of settlers to the American West forced the resolution of conflicting property claims in the nineteenth century via the Preemption Act, so, too, are the contemporary mass migrations of rural people to megacities around the globe forcing the creation of a new way of legitimating property ownership. DeSoto’s research shows that Third World and former communist countries harbor trillions of dollars of unleverageable dead capital. Individual countries have more wealth tied up as dead capital locked in their impoverished citizens’ homes than in their entire stock markets and GDPs.

So dead capital can be clearly and decisively distinguished from living capital. Living capital is represented by a title or deed legally sanctioned by society as a generally accepted demonstration of ownership. Capital is dead, or, better, not yet brought to life, when its general value (any value it may have beyond its utilitarian function) cannot be represented so as to be leveragable or transferable across time, space, applications, enterprises, etc.

An essential point is this: Human, social, and natural forms of capital are dead in the same way that Third World property is dead capital. We lack a means of representing the value of these forms of capital that is transferable across individuals and contexts. The sense of scientific capital as mobile, additive, and divisible, and as deployed via networks of metrological (measurement science) laboratories, is especially helpful here, as it provides a root definition of what capital is. The geometry of the geodetic survey information incorporated into titles and deeds provides a fundamental insight into capitalism and living capital. But an even better understanding can be found by looking more deeply into the metaphor equating sexual and economic success.

The Birds and the Bees

We all learn as children where babies come from. Spontaneous questions from curious kids can be simultaneously intimidating and hilarious. Discovering that we each came into existence at a certain point in time raises many questions. Children are usually interested, however, in a short answer to a specific question. They go about their processes of creating meaningful stories about the world slowly, bit by bit. Contrary to many parents’ fears, children are less interested in the big picture than they are in knowing something immediately relevant.

Today we are engaged in a similar process that involves both self-discovery and its extension into a model of the world. In the last 100 years, we have endured one crisis of alienation, war, and terrorism after another. So many different stresses are pulling life in so many different directions that it has become difficult to fit our lives into meaningful stories about the world. Anxiety about our roles and places relative to one another has led many of us to be either increasingly lax or increasingly rigid about where we stand. Being simultaneously intelligent and compassionate is more difficult than ever.

But perhaps we know more than we are aware of. Perhaps it would help for us to consider more closely where we as a people, with our modern, global culture, come from. Where did the ideas that shape our world come from? Where do new ideas in general come from? What happens when an idea comes alive with meaning and spreads with such rapidity that it seems to spring forth fully formed in many widely distant places? How does a meme become viral and spread like an epidemic? Questions like these have often been raised in recent years. It seems to me, though, that explorations of them to date have not focused as closely as they could have on what is most important.

For when we understand the reproductive biology of living meaning, and when we see how different species of conceptual life interrelate in larger ecologies, then we will be in the position we need to be in to newly harmonize nature and culture, male and female, black and white, capitalism and socialism, north and south, and east and west.

What is most important about knowing where modern life comes from? What is most important is often that which is most obvious, and the most taken for granted. Given the question, it is interesting that rich metaphors of biological reproduction are everywhere in our thinking about ideas and meaning. Ideas are conceived, for instance, and verbs are conjugated.

These metaphors are not just poetic, emotionally soothing, or apt in a locally specific way. Rather, they hold within themselves some very practical systematic consequences for the stories we tell about ourselves, others, our communities, and our world. That is to say, if we think clearly enough about where ideas come from, we may learn something important about how to create and tell better stories about ourselves, and we may improve the quality of our lives in the process.

So what better place to start than with one of the oldest and most often repeated stories about the first bite from the apple of knowledge? The Western cultural imagery associated with erotic sexuality and knowledgeable experience goes back at least to Eve, the apple, the Tree of Knowledge, and the serpent, in the Garden of Eden. This imagery is complemented by the self-described role of the ancient Greek philosopher, Socrates, as a midwife of ideas. Students still give apples to their teachers as symbols of knowledge, and a popular line of computers originally targeting the education market is named for the fruit of knowledge. The Socratic method is still taught, and charges teachers with helping students to give birth to fully formed ideas able take on lives of their own.

Socrates went further and said that we are enthralled with meaning in the same way a lover is captivated by the beloved. By definition, attention focuses on what is meaningful, as we ignore 99.99% of incoming sensory data. Recognition, by definition, is re-cognition, a seeing-again of something already known, usually something that has a name. Things that don’t have names are very difficult to see, so things come into language in special ways, via science or poetry. And the names of things focus our attention in very specific ways. Just as “weed” becomes a generic name for unwanted wild plants that might have very desirable properties, so, too, does “man” as a generic name for humans restrict thinking about people to males. The words we use very subtly condition our perceptions and behaviors, since, as Socrates put it, we are captivated by them.

The vital importance of sexuality to the reproductive potential of the species is evident in the extent to which it has subliminally been incorporated into the syntax, semantics, and grammar of language. Metaphoric images of procreation and reproduction so thoroughly permeate culture and language that the verb “to be” is referred to as the copula. New ideas brought into being via a copulative relation of subject and object accordingly are said to have been conceived, and are called concepts. One is said to be pregnant with an idea, or to have the seed or germ of an idea. Questions are probing, penetrating, or seminal. Productive minds are fertile or receptive. The back-and-forth give-and-take of conversation is referred to as social intercourse, and intercourse is the second definition in the dictionary for commerce. Dramatic expositions of events are said to climax, or to result in an anti-climax. Ideas and the narrative recounting of them are often called alluring, captivating, enchanting, spellbinding, or mesmerizing, and so it is that one can in fact be in love with an idea.

Philosophers, feminists, and social theorists have gone to great lengths in exploring the erotic in knowing, and vice versa. Luce Irigaray’s meditations on the fecund and Alfred Schutz’s reflections on our common birth from women both resonate with Paul Ricoeur’s examination of the choice between discourse and violence, which hinges on caring enough to try to create shared meaning. In all of these, we begin from love. Such a hopeful focus on nurturing new life stands in the starkest contrast with the existentialist elevation of death as our shared end.

Cultural inhibitions concerning sexuality can be interpreted as regulating it for the greater good. But Western moral proscriptions typically take a form in which sexuality is regarded as a kind of animal nature that must be subjugated in favor of a higher cultural or spiritual nature. In this world view, just as the natural environment is to be dominated and controlled via science and industry, sexual impulses are controlled, with the feminine relegated to a secondary and dangerous status.

Though promiscuity continues to have destructive effects on society and personal relationships, significant strides have been taken toward making sexual relations better balanced, with sex itself considered an essential part of health and well-being. Puritanical attitudes reject sexual expression and refuse to experience fully this most ecstatic way in which we exist, naturally. But accepting our nature, especially that part of it through which we ensure the continuity of the species, is essential to reintegrating nature and culture.

Finding that sexuality permeates every relationship and all communication is a part of that process. The continuity of the species is no longer restricted to concern with biological reproduction. We must learn to apply what we know from generations of experience with sexual, family, and social relationships in new ways, at new levels of complexity. In the same way that lovemaking is an unhurried letting-be that lingers in caring caresses mutually defining each lover to the other, so must we learn to see analogous, though less intense, ways of being together in every form of communion characteristic of communication and community. Love does indeed make the world go round.

Commerce and Science

There are many encouraging signs suggesting that new possibilities may yet be born of old, even ancient, ideas and philosophies. Many have observed over the last several decades that a new age is upon us, that the modern world’s metaphor of a clockwork universe is giving way to something less deterministic and warmer, less alien and more homey. In many respects, what the paradigm shift comes down to is a recognition that the universe is not an inanimate machine but an intelligent living system. Cold, hard, facts are being replaced with warm, resilient ones that are no less objective in the way they assert themselves as independent entities in the world.

In tune with this shift, increasing numbers of businesses and governments are realizing that long term profitability depends on good relationships with an educated and healthy workforce in a stable sociopolitical context, and with respect to the irreplacable environmental services provided by forests, watersheds, estuaries, fisheries, and ecological biodiversity. As Senge (in de Geus, 1997, p. xi) points out,

In Swedish, the oldest term for ‘business’ is narings liv, literally ‘nourishment for life.’ The ancient Chinese characters for ‘business,’ [are] at least 3,000 years old. The first of these characters translates as ‘life’ or ‘live.’ It can also be translated as ‘survive’ and ‘birth.’ The second translates as ‘meaning.’

Ready counterparts for these themes are deeply rooted in the English language. Without being aware of it, without having made any scholarly inquiry into Socrates’ maieutic arts, virtually every one of us already knows everything we need to know about the birth of living meaning. In any everyday assertion that something is such and so, in linking any subject with a predicate, we re-enact a metaphor of reproductive success in the creation of new meaning.

And here, at the very center of language and communication, the reproduction of meaning in conversation requires a copulative act, a conjugal relation, a coupling of subjects and objects via predicates. The back and forth movement of social intercourse is the deep structure that justifies and brings out its full discursive meaning as a pleasurable and productive process that involves probing, seminal questions; conceiving, being pregnant with, and Socratically midwifing ideas; dramatic climaxes; and a state of enchantment, hypnosis, or rapture that focuses attention and provokes passionate engagement.

When has an idea been successfully midwifed and come to life? We know an idea has come to life when we can restate it in our own words and obtain the same result. We know an idea has come to life when we can communicate it to someone else and they too can apply it in their own terms in new situations.

In his book on resolving the mystery of capital, De Soto points out that living capital can be acted on in banks and courts because it is represented abstractly in instruments like titles and deeds. Dead capital, in contrast, for which legal title does not exist, cannot be used as the basis for a mortgage or a small business loan, nor can one claim a right to the property in court.

Similarly, electrical appliances and machinery are living capital because they work the same way everywhere they can be connected to a standardized power grid by trained operators who have access to the right tool sets. Before the advent of widely shared standards, however, something as simple as different sized hoses and connections on hydrants allowed minor disasters to become catastrophes when fire trucks from different districts responding to an alarm were unable to put their available tools to use.

The distinction between dead and living capital is ultimately scientific, metrological, and mathematical. In ancient Greece, geometrical and arithmetical conversations were the first to be referred to as mathematical because they regularly arrive at the same conclusions no matter who the teacher and student are, and no matter which particular graphical or numerical figures are involved. That is, living meaning is objective; it stays the same, within a range of error, independent of the circumstances in which it is produced.

We can illustrate the conception, gestation, and birth of meaning in terms that lead directly to powerful methods of measurement using tests, assessments, and surveys. In yet another instance of linguistic biomimicry, the mathematical word “matrix” is derived from the ancient Greek word for womb. The matrix of observations recorded from the interaction of questions and answers is the fertile womb in which new ideals are conceived and gestated, and from which they are midwifed.

How? The monotony of the repeated questions and answers in the dialogue reveals the inner logic of the way the subject matter develops. By constantly connecting and reconnecting with the partner in dialogue, Socrates ensures that they stay together, attending to the same object. The reiterated yesses allow the object of the conversation to play itself out through what is said.

Conversational objects can exhibit strongly, and even strikingly, constant patterns of responses across different sets of similar questions posed at different times and places to different people by different interviewers, teachers, or surveyers. We create an increased likelihood of conceiving and birthing living meaning when questions are written in a way that enables them all to attend to the same thing, when they are asked of people also able to attend to that conversational object, and when we score the responses consistently as indicating right or wrong, agree or disagree, frequent or rare, etc.

When test, assessment, and survey instruments are properly designed, they bring meaning to life. They do so by making it possible to arrive at the same measure (the same numeric value, within a small range) for a given amount (of literacy, numeracy, health, motivation, innovation, trustworthiness, etc.) no matter who possesses it and no matter which particular collection of items or instrument is used to measure it. For numbers to be meaningful, they have to represent something that stays the same across particular expressions of the thing measured, and across particular persons measured.

We typically think of comparability in survey or testing research as requiring all respondents or examinees to answer the same questions, but this has not been true in actual measurement practice for decades. The power grid, electrical outlets, and appliances are all constructed so as to work together seamlessly across the vast majority of variations in who is using them, when and where they are used, what they are used for, and why they are used. In parallel fashion, educators are increasingly working to ensure that books, reading tests, and instructional curricula also work together no matter who publishes or administers them, or who reads them or who is measured by them.

The advantages of living literacy capital, for instance, go far beyond what can be accomplished with dead literacy capital. When each teacher matches books to readers using her or his personal knowledge, opportunities for uncontrolled variation emerge, and many opportunities for teachers to learn from each other are closed off. When each teacher’s tests are scored in terms of test-dependent counts of correct answers, knowing where any given child stands relative to the educational objectives is made unnecessarily difficult.

In contrast with these dead capital metrics, living literacy capital, such as is made available by the Lexile Framework for Reading and Writing (www.lexile.com), facilitates systematic comparisons of reading abilities with text reading difficulties, relative to different rates of reading comprehension. Instruction can be individualized, which acknowledges and addresses the fact that any given elementary school classroom typically incorporates at least four different grade levels of reading ability.

Reading is thereby made more enjoyable, both for students who are bored by the easiness of the standard classroom text and for those who find it incomprehensible. Testing is transformed from a pure accountability exercise irrelevant to instruction into a means of determining what a child knows and what can optimally be taught next. Growth in reading can be plotted, not only within school years but across them. Students can move from one school to another, or from grade to grade, without losing track of where they stand on the continuum of reading ability, and without unnecessarily making teachers’ lives more difficult.

In the context of living literacy capital, publishers can better gauge the appropriateness of their books for the intended audiences. Teachers can begin the school year knowing where their students stand relative to the end-of-year proficiency standard, can track progress toward it as time passes, and can better ensure that standards are met. Parents can go online, with their children, to pick out books at appropriate reading levels for birthday and holiday gifts, and for summer reading.

Plainly, what we have achieved with living literacy capital is a capacity to act on the thing itself, literacy, in a manner that adheres to the Golden Rule, justly and fairly treating each reader the way any other reader would want to be treated. In this system of universally uniform and ubiquitously accessible metrics, we can act on literacy itself, instead of confusing it with the reading difficulty of any particular text, the reading ability of any particular student, or any interaction between them. In the same way that titles and deeds make it possible to represent owned property in banks and courts abstractly, so, too, does a properly conceived, calibrated, and distributed literacy metric enable every member of the species of literate humans to thrive in ecological niches requiring an ability to read as a survival skill.

The technical means by which literacy capital has been brought to life should be applied to all forms of human, social, and natural capital. Hospital, employment, community, governance, and environmental quality, and individual numeracy, health, functionality, motivation, etc. are all assessed using rating systems that largely have not yet been calibrated, much less brought together into frameworks of shared uniform metric standards. The body of research presenting instrument calibration studies is growing, but much remains to be done. All of the prior posts in this blog and all of my publications, from the most technical to the most philosophical, bear on the challenging problems we face in becoming stewards of living meaning.

The issues are all of a piece. We have to be the change we want to make happen. It won’t work if we mechanically separate what is organically whole. There’s nothing to do but to keep buzzing those beautiful flowers blooming in the fields, pollinating them and bringing back the bits of nourishment that feed the hive. In this way, this season’s fruit ripens, the seeds of new life take shape, and may yet be planted to grow in fertile fields.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

How bad will the financial crises have to get before…?

April 30, 2010

More and more states and nations around the world face the possibility of defaulting on their financial obligations. The financial crises are of epic historical proportions. This is a disaster of the first order. And yet, it is so odd–we have the solutions and preventative measures we need at our finger tips, but no one knows about them or is looking for them.

So,  I am persuaded to once again wonder if there might now be some real interest in the possibilities of capitalizing on

  • measurement’s well-known capacity for reducing transaction costs by improving information quality and reducing information volume;
  • instruments calibrated to measure in constant units (not ordinal ones) within known error ranges (not as though the measures are perfectly precise) with known data quality;
  • measures made meaningful by their association with invariant scales defined in terms of the questions asked;
  • adaptive instrument administration methods that make all measures equally precise by targeting the questions asked;
  • judge calibration methods that remove the person rating performances as a factor influencing the measures;
  • the metaphor of transparency by calibrating instruments that we really look right through at the thing measured (risk, governance, abilities, health, performance, etc.);
  • efficient markets for human, social, and natural capital by means of the common currencies of uniform metrics, calibrated instrumentation, and metrological networks;
  • the means available for tuning the instruments of the human, social, and environmental sciences to well-tempered scales that enable us to more easily harmonize, orchestrate, arrange, and choreograph relationships;
  • our understandings that universal human rights require universal uniform measures, that fair dealing requires fair measures, and that our measures define who we are and what we value; and, last but very far from least,
  • the power of love–the back and forth of probing questions and honest answers in caring social intercourse plants seminal ideas in fertile minds that can be nurtured to maturity and Socratically midwifed as living meaning born into supportive ecologies of caring relations.

How bad do things have to get before we systematically and collectively implement the long-established and proven methods we have at our disposal? It is the most surreal kind of schizophrenia or passive-aggressive avoidance pathology to keep on tormenting ourselves with problems for which we have solutions.

For more information on these issues, see prior blogs posted here, the extensive documentation provided, and http://www.livingcapitalmetrics.com.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Guttman on sufficiency, statistics, and cumulative science

April 29, 2010

“R. A. Fisher employed maximum likelihood…as a way of finding sufficient statistics if they exist. Now, sufficient statistics rarely exist, and even when they do, their use need not be optimal for estimation problems. As enlarged on in Reference 7 [an unpublished 1984 paper of Guttman’s], for each best unbiased sufficient statistic there generally is a better–and not necessarily sufficient–biased one. To use maximum likelihood requires knowledge of the complete sampling distribution, but biased estimation is proved to be better in a distribution-free fashion.” (Guttman, 1985, pp. 7-8; 1994, pp. 345-6)

In this passage, Guttman may be addressing issues related to the kind of biases that can affect extreme scores in Joint Maximum Likelihood Estimation (JMLE, formerly UCON) (Jansen, van den Wollenberg, Wierda, 1988; Wright, 1988; Wright & Panchapakesan, 1969). But what’s more interesting is the combination of the awareness of sufficiency and estimation issues revealed in this remark with the context in which it is made.

Guttman targets and rightly skewers a good number of inconsistencies and internal contradictions in statistical inference. But he shares in many of them himself. That is, Guttman’s valuable insights as to measurement are limited by his failure to consider at all the importance of the instrument in science, and by his limited appreciation of the value of theory. This is so despite his realization that “There can be no solution [to the problem of sampling items from one or more indefinitely large universes of content] without a structural theory” (1994, p. 329, in his Psychometrika review of Gulliksen’s Theory of Mental Tests), which is fully in tune with his emphasis on the central role of substantive replication in science (1994, p. 343).

But in the way he articulates his concern with replication, we see that, for Guttman, as for so many others, measurement is a matter of data analysis and not one of calibrating instruments. Measurement is not primarily a statistical process performed on computers, but is an individual event performed with an instrument. Calibrated instruments remove the necessity for data analysis (though other kinds of analysis may, of course, be continued or commenced).

In reading Guttman, it is difficult to follow through on his pithy and rich observations on the inconsistencies and illogic of statistical inference because he does not offer a clear alternative path, a measurement path structured by instrumentation. In his review of Lord and Novick (1968), for instance, Guttman remarks on the authors’ failure to provide their promised synthetic theory of tests and measurement, but does not offer or point toward one himself, even after noting the inclusion of Rasch’s Poisson models in the Lord and Novick classification system. Though much has been done to connect Guttman with Rasch (Andrich, 1982, 1985; Douglas & Wright, 1989; Engelhard, 2008; Linacre, 1991, 2000; Linacre & Wright, 1996; Tenenbaum, 1999; Wilson, 1989), and to advance in the direction of point-of-use measurement (Bode, 1999; Bode, Heinemann, & Semik, 2000; Connolly, Nachtman, & Pritchett, 1971; Davis, Perruccio, Canizares, Tennant, Hawker, et al., 2008; Linacre, 1997; many others), much more remains to be done.

Andrich, D. (1982, June). An index of person separation in Latent Trait Theory, the traditional KR-20 index, and the Guttman scale response pattern. Education Research and Perspectives, 9(1), 95-104 [http://www.rasch.org/erp7.htm].

Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. B. Tuma (Ed.), Sociological methodology 1985 (pp. 33-80). San Francisco, California: Jossey-Bass.

Bode, R. K. (1999). Self-scoring key for Galveston Orientation and Amnesia Test. Rasch Measurement Transactions, 13(1), 680 [http://www.rasch.org/rmt/rmt131c.htm].

Bode, R. K., Heinemann, A. W., & Semik, P. (2000, Feb). Measurement properties of the Galveston Orientation and Amnesia Test (GOAT) and improvement patterns during inpatient rehabilitation. Journal of Head Trauma Rehabilitation, 15(1), 637-55.

Connolly, A. J., Nachtman, W., & Pritchett, E. M. (1971). Keymath: Diagnostic Arithmetic Test. Circle Pines, Minnesota: American Guidance Service.

Davis, A. M., Perruccio, A. V., Canizares, M., Tennant, A., Hawker, G. A., Conaghan, P. G., et al. (2008, May). The development of a short measure of physical function for hip OA HOOS-Physical Function Shortform (HOOS-PS): An OARSI/OMERACT initiative. Osteoarthritis Cartilage, 16(5), 551-9.

Douglas, G. A., & Wright, B. D. (1989). Response patterns and their probabilities. Rasch Measurement Transactions, 3(4), 75-77 [http://www.rasch.org/rmt/rmt34.htm].

Engelhard, G. (2008, July). Historical perspectives on invariant measurement: Guttman, Rasch, and Mokken. Measurement: Interdisciplinary Research & Perspectives, 6(3), 155-189.

Guttman, L. (1985). The illogic of statistical inference for cumulative science. Applied Stochastic Models and Data Analysis, 1, 3-10. (Reprinted in Guttman 1994, pp. 341-348)

Guttman, L. (1994). Louis Guttman on theory and methodology: Selected writings (S. Levy, Ed.). Dartmouth Benchmark Series. Brookfield, VT: Dartmouth Publishing Company.

Jansen, P., Van den Wollenberg, A., & Wierda, F. (1988). Correcting unconditional parameter estimates in the Rasch model for inconsistency. Applied Psychological Measurement, 12(3), 297-306.

Linacre, J. M. (1991, Spring). Stochastic Guttman order. Rasch Measurement Transactions, 5(4), 189 [http://www.rasch.org/rmt/rmt54p.htm].

Linacre, J. M. (1997). Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation State of the Art Reviews, 11(2), 315-324 [http://www.rasch.org/memo60.htm].

Linacre, J. M. (2000, Autumn). Guttman coefficients and Rasch data. Rasch Measurement Transactions, 14(2), 746-7 [http://www.rasch.org/rmt/rmt142e.htm].

Linacre, J. M., & Wright, B. D. (1996, Autumn). Guttman-style item location maps. Rasch Measurement Transactions, 10(2), 492-3 [http://www.rasch.org/rmt/rmt102h.htm].

Lord, F. M., & Novick, M. R. (Eds.). (1968). Statistical theories of mental test scores. Reading, Massachusetts: Addison-Wesley.

Tenenbaum, G. (1999, Jan-Mar). The implementation of Thurstone’s and Guttman’s measurement ideas in Rasch analysis. International Journal of Sport Psychology, 30(1), 3-16.

Wilson, M. (1989). A comparison of deterministic and probabilistic approaches to learning structures. Australian Journal of Education, 33(2), 127-140.

Wright, B. D. (1988, Sep). The efficacy of unconditional maximum likelihood bias correction: Comment on Jansen, Van den Wollenberg, and Wierda. Applied Psychological Measurement, 12(3), 315-318.

Wright, B. D., & Panchapakesan, N. (1969). A procedure for sample-free item analysis. Educational and Psychological Measurement, 29(1), 23-48.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

How Evidence-Based Decision Making Suffers in the Absence of Theory and Instrument: The Power of a More Balanced Approach

January 28, 2010

The Basis of Evidence in Theory and Instrument

The ostensible point of basing decisions in evidence is to have reasons for proceeding in one direction versus any other. We want to be able to say why we are proceeding as we are. When we give evidence-based reasons for our decisions, we typically couch them in terms of what worked in past experience. That experience might have been accrued over time in practical applications, or it might have been deliberately arranged in one or more experimental comparisons and tests of concisely stated hypotheses.

At its best, generalizing from past experience to as yet unmet future experiences enables us to navigate life and succeed in ways that would not be possible if we could not learn and had no memories. The application of a lesson learned from particular past events to particular future events involves a very specific inferential process. To be able to recognize repeated iterations of the same things requires the accumulation of patterns of evidence. Experience in observing such patterns allows us to develop confidence in our understanding of what that pattern represents in terms of pleasant or painful consequences. When we are able to conceptualize and articulate an idea of a pattern, and when we are then able to recognize a new occurrence of that pattern, we have an idea of it.

Evidence-based decision making is then a matter of formulating expectations from repeatedly demonstrated and routinely reproducible patterns of observations that lend themselves to conceptual representations, as ideas expressed in words. Linguistic and cultural frameworks selectively focus attention by projecting expectations and filtering observations into meaningful patterns represented by words, numbers, and other symbols. The point of efforts aimed at basing decisions in evidence is to try to go with the flow of this inferential process more deliberately and effectively than might otherwise be the case.

None of this is new or controversial. However, the inferential step from evidence to decision always involves unexamined and unjustified assumptions. That is, there is always an element of metaphysical faith behind the expectation that any given symbol or word is going to work as a representation of something in the same way that it has in the past. We can never completely eliminate this leap of faith, since we cannot predict the future with 100% confidence. We can, however, do a lot to reduce the size of the leap, and the risks that go with it, by questioning our assumptions in experimental research that tests hypotheses as to the invariant stability and predictive utility of the representations we make.

Theoretical and Instrumental Assumptions Hidden Behind the Evidence

For instance, evidence as to the effectiveness of an intervention or treatment is often expressed in terms of measures commonly described as quantitative. But it is unusual for any evidence to be produced justifying that description in terms of something that really adds up in the way numbers do. So we often find ourselves in situations in which our evidence is much less meaningful, reliable, and valid than we suppose it to be.

Quantitative measures are often valued as the hallmark of rational science. But their capacity to live up to this billing depends on the quality of the inferences that can be supported. Very few researchers thoroughly investigate the quality of their measures and justify the inferences they make relative to that quality.

Measurement presumes a reproducible pattern of evidence that can serve as the basis for a decision concerning how much of something has been observed. It naturally follows that we often base measurement in counts of some kind—successes, failures, ratings, frequencies, etc. The counts, scores, or sums are then often transformed into percentages by dividing them into the maximum possible that could be obtained. Sometimes the scores are averaged for each person measured, and/or for each item or question on the test, assessment, or survey. These scores and percentages are then almost universally fed directly into decision processes or statistical analyses with no further consideration.

The reproducible pattern of evidence on which decisions are based is presumed to exist between the measures, not within them. In other words, the focus is on the group or population statistics, not on the individual measures. Attention is typically focused on the tip of the iceberg, the score or percentage, not on the much larger, but hidden, mass of information beneath it. Evidence is presumed to be sufficient to the task when the differences between groups of scores are of a consistent size or magnitude, but is this sufficient?

Going Past Assumptions to Testable Hypotheses

In other words, does not science require that evidence be explained by theory, and embodied in instrumentation that provides a shared medium of observation? As shown in the blue lines in the Figure below,

  • theory, whether or not it is explicitly articulated, inevitably influences both what counts as valid data and the configuration of the medium of its representation, the instrument;
  • data, whether or not it is systematically gathered and evaluated, inevitably influences both the medium of its representation, the instrument, and the implicit or explicit theory that explains its properties and justifies its applications; and
  • instruments, whether or not they are actually calibrated from a mapping of symbols and substantive amounts, inevitably influence data gathering and the image of the object explained by theory.

The rhetoric of evidence-based decision making skips over the roles of theory and instrumentation, drawing a direct line from data to decision. In leaving theory laxly formulated, we allow any story that makes a bit of sense and is communicated by someone with a bit of charm or power to carry the day. In not requiring calibrated instrumentation, we allow any data that cross the threshold into our awareness to serve as an acceptable basis for decisions.

What we want, however, is to require meaningful measures that really provide the evidence needed for instruments that exhibit invariant calibrations and for theories that provide predictive explanatory control over the variable. As shown in the Figure, we want data that push theory away from the instrument, theory that separates the data and instrument, and instruments that get in between the theory and data.

We all know to distrust too close a correspondence between theory and data, but we too rarely understand or capitalize on the role of the instrument in mediating the theory-data relation. Similarly, when the questions used as a medium for making observations are obviously biased to produce responses conforming overly closely with a predetermined result, we see that the theory and the instrument are too close for the data to serve as an effective mediator.

Finally, the situation predominating in the social sciences is one in which both construct and measurement theories are nearly nonexistent, which leaves data completely dependent on the instrument it came from. In other words, because counts of correct answers or sums of ratings are mistakenly treated as measures, instruments fully determine and restrict the range of measurement to that defined by the numbers of items and rating categories. Once the instrument is put in play, changes to it would make new data incommensurable with old, so, to retain at least the appearance of comparability, the data structure then fully determines and restricts the instrument.

What we want, though, is a situation in which construct and measurement theories work together to make the data autonomous of the particular instrument it came from. We want a theory that explains what is measured well enough for us to be able to modify existing instruments, or create entirely new ones, that give the same measures for the same amounts as the old instruments. We want to be able to predict item calibrations from the properties of the items, we want to obtain the same item calibrations across data sets, and we want to be able to predict measures on the basis of the observed responses (data) no matter which items or instrument was used to produce them.

Most importantly, we want a theory and practice of measurement that allows us to take missing data into account by providing us with the structural invariances we need as media for predicting the future from the past. As Ben Wright (1997, p. 34) said, any data analysis method that requires complete data to produce results disqualifies itself automatically as a viable basis for inference because we never have complete data—any practical system of measurement has to be positioned so as to be ready to receive, process, and incorporate all of the data we have yet to gather. This goal is accomplished to varying degrees in Rasch measurement (Rasch, 1960; Burdick, Stone, & Stenner, 2006; Dawson, 2004). Stenner and colleagues (Stenner, Burdick, Sanford, & Burdick, 2006) provide a trajectory of increasing degrees to which predictive theory is employed in contemporary measurement practice.

The explanatory and predictive power of theory is embodied in instruments that focus attention on recording observations of salient phenomena. These observations become data that inform the calibration of instruments, which then are used to gather further data that can be used in practical applications and in checks on the calibrations and the theory.

“Nothing is so practical as a good theory” (Lewin, 1951, p. 169). Good theory makes it possible to create symbolic representations of things that are easy to think with. To facilitate clear thinking, our words, numbers, and instruments must be transparent. We have to be able to look right through them at the thing itself, with no concern as to distortions introduced by the instrument, the sample, the observer, the time, the place, etc. This happens only when the structure of the instrument corresponds with invariant features of the world. And where words effect this transparency to an extent, it is realized most completely when we can measure in ways that repeatedly give the same results for the same amounts in the same conditions no matter which instrument, sample, operator, etc. is involved.

Where Might Full Mathematization Lead?

The attainment of mathematical transparency in measurement is remarkable for the way it focuses attention and constrains the imagination. It is essential to appreciate the context in which this focusing occurs, as popular opinion is at odds with historical research in this regard. Over the last 60 years, historians of science have come to vigorously challenge the widespread assumption that technology is a product of experimentation and/or theory (Kuhn, 1961/1977; Latour, 1987, 2005; Maas, 2001; Mendelsohn, 1992; Rabkin, 1992; Schaffer, 1992; Heilbron, 1993; Hankins & Silverman, 1999; Baird, 2002). Neither theory nor experiment typically advances until a key technology is widely available to end users in applied and/or research contexts. Rabkin (1992) documents multiple roles played by instruments in the professionalization of scientific fields. Thus, “it is not just a clever historical aphorism, but a general truth, that ‘thermodynamics owes much more to the steam engine than ever the steam engine owed to thermodynamics’” (Price, 1986, p. 240).

The prior existence of the relevant technology comes to bear on theory and experiment again in the common, but mistaken, assumption that measures are made and experimentally compared in order to discover scientific laws. History shows that measures are rarely made until the relevant law is effectively embodied in an instrument (Kuhn, 1961/1977, pp. 218-9): “…historically the arrow of causality is largely from the technology to the science” (Price, 1986, p. 240). Instruments do not provide just measures; rather they produce the phenomenon itself in a way that can be controlled, varied, played with, and learned from (Heilbron, 1993, p. 3; Hankins & Silverman, 1999; Rabkin, 1992). The term “technoscience” has emerged as an expression denoting recognition of this priority of the instrument (Baird, 1997; Ihde & Selinger, 2003; Latour, 1987).

Because technology often dictates what, if any, phenomena can be consistently produced, it constrains experimentation and theorizing by focusing attention selectively on reproducible, potentially interpretable effects, even when those effects are not well understood (Ackermann, 1985; Daston & Galison, 1992; Ihde, 1998; Hankins & Silverman, 1999; Maasen & Weingart, 2001). Criteria for theory choice in this context stem from competing explanatory frameworks’ experimental capacities to facilitate instrument improvements, prediction of experimental results, and gains in the efficiency with which a phenomenon is produced.

In this context, the relatively recent introduction of measurement models requiring additive, invariant parameterizations (Rasch, 1960) provokes speculation as to the effect on the human sciences that might be wrought by the widespread availability of consistently reproducible effects expressed in common quantitative languages. Paraphrasing Price’s comment on steam engines and thermodynamics, might it one day be said that as yet unforeseeable advances in reading theory will owe far more to the Lexile analyzer (Stenner, et al., 2006) than ever the Lexile analyzer owed reading theory?

Kuhn (1961/1977) speculated that the second scientific revolution of the early- to mid-nineteenth century followed in large part from the full mathematization of physics, i.e., the emergence of metrology as a professional discipline focused on providing universally accessible, theoretically predictable, and evidence-supported uniform units of measurement (Roche, 1998). Kuhn (1961/1977, p. 220) specifically suggests that a number of vitally important developments converged about 1840 (also see Hacking, 1983, p. 234). This was the year in which the metric system was formally instituted in France after 50 years of development (it had already been obligatory in other nations for 20 years at that point), and metrology emerged as a professional discipline (Alder, 2002, p. 328, 330; Heilbron, 1993, p. 274; Kula, 1986, p. 263). Daston (1992) independently suggests that the concept of objectivity came of age in the period from 1821 to 1856, and gives examples illustrating the way in which the emergence of strong theory, shared metric standards, and experimental data converged in a context of particular social mores to winnow out unsubstantiated and unsupportable ideas and contentions.

Might a similar revolution and new advances in the human sciences follow from the introduction of evidence-based, theoretically predictive, instrumentally mediated, and mathematical uniform measures? We won’t know until we try.

Figure. The Dialectical Interactions and Mutual Mediations of Theory, Data, and Instruments

Figure. The Dialectical Interactions and Mutual Mediations of Theory, Data, and Instruments

Acknowledgment. These ideas have been drawn in part from long consideration of many works in the history and philosophy of science, primarily Ackermann (1985), Ihde (1991), and various works of Martin Heidegger, as well as key works in measurement theory and practice. A few obvious points of departure are listed in the references.

References

Ackermann, J. R. (1985). Data, instruments, and theory: A dialectical approach to understanding science. Princeton, New Jersey: Princeton University Press.

Alder, K. (2002). The measure of all things: The seven-year odyssey and hidden error that transformed the world. New York: The Free Press.

Aldrich, J. (1989). Autonomy. Oxford Economic Papers, 41, 15-34.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Baird, D. (1997, Spring-Summer). Scientific instrument making, epistemology, and the conflict between gift and commodity economics. Techné: Journal of the Society for Philosophy and Technology, 3-4, 25-46. Retrieved 08/28/2009, from http://scholar.lib.vt.edu/ejournals/SPT/v2n3n4/baird.html.

Baird, D. (2002, Winter). Thing knowledge – function and truth. Techné: Journal of the Society for Philosophy and Technology, 6(2). Retrieved 19/08/2003, from http://scholar.lib.vt.edu/ejournals/SPT/v6n2/baird.html.

Burdick, D. S., Stone, M. H., & Stenner, A. J. (2006). The Combined Gas Law and a Rasch Reading Law. Rasch Measurement Transactions, 20(2), 1059-60 [http://www.rasch.org/rmt/rmt202.pdf].

Carroll-Burke, P. (2001). Tools, instruments and engines: Getting a handle on the specificity of engine science. Social Studies of Science, 31(4), 593-625.

Daston, L. (1992). Baconian facts, academic civility, and the prehistory of objectivity. Annals of Scholarship, 8, 337-363. (Rpt. in L. Daston, (Ed.). (1994). Rethinking objectivity (pp. 37-64). Durham, North Carolina: Duke University Press.)

Daston, L., & Galison, P. (1992, Fall). The image of objectivity. Representations, 40, 81-128.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York, New York: Routledge.

Hacking, I. (1983). Representing and intervening: Introductory topics in the philosophy of natural science. Cambridge: Cambridge University Press.

Hankins, T. L., & Silverman, R. J. (1999). Instruments and the imagination. Princeton, New Jersey: Princeton University Press.

Heelan, P. A. (1983, June). Natural science as a hermeneutic of instrumentation. Philosophy of Science, 50, 181-204.

Heelan, P. A. (1998, June). The scope of hermeneutics in natural science. Studies in History and Philosophy of Science Part A, 29(2), 273-98.

Heidegger, M. (1977). Modern science, metaphysics, and mathematics. In D. F. Krell (Ed.), Basic writings [reprinted from M. Heidegger, What is a thing? South Bend, Regnery, 1967, pp. 66-108] (pp. 243-282). New York: Harper & Row.

Heidegger, M. (1977). The question concerning technology. In D. F. Krell (Ed.), Basic writings (pp. 283-317). New York: Harper & Row.

Heilbron, J. L. (1993). Weighing imponderables and other quantitative science around 1800. Historical studies in the physical and biological sciences), 24(Supplement), Part I, pp. 1-337.

Hessenbruch, A. (2000). Calibration and work in the X-ray economy, 1896-1928. Social Studies of Science, 30(3), 397-420.

Ihde, D. (1983). The historical and ontological priority of technology over science. In D. Ihde, Existential technics (pp. 25-46). Albany, New York: State University of New York Press.

Ihde, D. (1991). Instrumental realism: The interface between philosophy of science and philosophy of technology. (The Indiana Series in the Philosophy of Technology). Bloomington, Indiana: Indiana University Press.

Ihde, D. (1998). Expanding hermeneutics: Visualism in science. Northwestern University Studies in Phenomenology and Existential Philosophy). Evanston, Illinois: Northwestern University Press.

Ihde, D., & Selinger, E. (Eds.). (2003). Chasing technoscience: Matrix for materiality. (Indiana Series in Philosophy of Technology). Bloomington, Indiana: Indiana University Press.

Kuhn, T. S. (1961/1977). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. In T. S. Kuhn, The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago: University of Chicago Press, 1977).

Kula, W. (1986). Measures and men (R. Screter, Trans.). Princeton, New Jersey: Princeton University Press (Original work published 1970).

Lapre, M. A., & Van Wassenhove, L. N. (2002, October). Learning across lines: The secret to more efficient factories. Harvard Business Review, 80(10), 107-11.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York, New York: Cambridge University Press.

Latour, B. (2005). Reassembling the social: An introduction to Actor-Network-Theory. (Clarendon Lectures in Management Studies). Oxford, England: Oxford University Press.

Lewin, K. (1951). Field theory in social science: Selected theoretical papers (D. Cartwright, Ed.). New York: Harper & Row.

Maas, H. (2001). An instrument can make a science: Jevons’s balancing acts in economics. In M. S. Morgan & J. Klein (Eds.), The age of economic measurement (pp. 277-302). Durham, North Carolina: Duke University Press.

Maasen, S., & Weingart, P. (2001). Metaphors and the dynamics of knowledge. (Vol. 26. Routledge Studies in Social and Political Thought). London: Routledge.

Mendelsohn, E. (1992). The social locus of scientific instruments. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 5-22). Bellingham, WA: SPIE Optical Engineering Press.

Polanyi, M. (1964/1946). Science, faith and society. Chicago: University of Chicago Press.

Price, D. J. d. S. (1986). Of sealing wax and string. In Little Science, Big Science–and Beyond (pp. 237-253). New York, New York: Columbia University Press.

Rabkin, Y. M. (1992). Rediscovering the instrument: Research, industry, and education. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 57-82). Bellingham, Washington: SPIE Optical Engineering Press.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roche, J. (1998). The mathematics of measurement: A critical history. London: The Athlone Press.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Thurstone, L. L. (1959). The measurement of values. Chicago: University of Chicago Press, Midway Reprint Series.

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.