Posts Tagged ‘metric system’

Debt, Revenue, and Changing the Way Washington Works: The Greatest Entrepreneurial Opportunity of Our Time

July 30, 2011

“Holding the line” on spending and taxes does not make for a fundamental transformation of the way Washington works. Simply doing less of one thing is just a small quantitative change that does nothing to build positive results or set a new direction. What we need is a qualitative metamorphosis akin to a caterpillar becoming a butterfly. In contrast with this beautiful image of natural processes, the arguments and so-called principles being invoked in the sham debate that’s going on are nothing more than fights over where to put deck chairs on the Titanic.

What sort of transformation is possible? What kind of a metamorphosis will start from who and where we are, but redefine us sustainably and responsibly? As I have repeatedly explained in this blog, my conference presentations, and my publications, with numerous citations of authoritative references, we already possess all of the elements of the transformation. We have only to organize and deploy them. Of course, discerning what the resources are and how to put them together is not obvious. And though I believe we will do what needs to be done when we are ready, it never hurts to prepare for that moment. So here’s another take on the situation.

Infrastructure that supports lean thinking is the name of the game. Lean thinking focuses on identifying and removing waste. Anything that consumes resources but does not contribute to the quality of the end product is waste. We have enormous amounts of wasteful inefficiency in many areas of our economy. These inefficiencies are concentrated in areas in which management is hobbled by low quality information, where we lack the infrastructure we need.

Providing and capitalizing on this infrastructure is The Greatest Entrepreneurial Opportunity of Our Time. Changing the way Washington (ha! I just typed “Wastington”!) works is the same thing as mitigating the sources of risk that caused the current economic situation. Making government behave more like a business requires making the human, social, and natural capital markets more efficient. Making those markets more efficient requires reducing the costs of transactions. Those costs are determined in large part by information quality, which is a function of measurement.

It is often said that the best way to reduce the size of government is to move the functions of government into the marketplace. But this proposal has never been associated with any sense of the infrastructural components needed to really make the idea work. Simply reducing government without an alternative way of performing its functions is irresponsible and destructive. And many of those who rail on and on about how bad or inefficient government is fail to recognize that the government is us. We get the government we deserve. The government we get follows directly from the kind of people we are. Government embodies our image of ourselves as a people. In the US, this is what having a representative form of government means. “We the people” participate in our society’s self-governance not just by voting, writing letters to congress, or demonstrating, but in the way we spend our money, where we choose to live, work, and go to school, and in every decision we make. No one can take a breath of air, a drink of water, or a bite of food without trusting everyone else to not carelessly or maliciously poison them. No one can buy anything or drive down the street without expecting others to behave in predictable ways that ensure order and safety.

But we don’t just trust blindly. We have systems in place to guard against those who would ruthlessly seek to gain at everyone else’s expense. And systems are the point. No individual person or firm, no matter how rich, could afford to set up and maintain the systems needed for checking and enforcing air, water, food, and workplace safety measures. Society as a whole invests in the infrastructure of measures created, maintained, and regulated by the government’s Department of Commerce and the National Institute for Standards and Technology (NIST). The moral importance and the economic value of measurement standards has been stressed historically over many millennia, from the Bible and the Quran to the Magna Carta and the French Revolution to the US Constitution. Uniform weights and measures are universally recognized and accepted as essential to fair trade.

So how is it that we nonetheless apparently expect individuals and local organizations like schools, businesses, and hospitals to measure and monitor students’ abilities; employees’ skills and engagement; patients’ health status, functioning, and quality of care; etc.? Why do we not demand common currencies for the exchange of value in human, social, and natural capital markets? Why don’t we as a society compel our representatives in government to institute the will of the people and create new standards for fair trade in education, health care, social services, and environmental management?

Measuring better is not just a local issue! It is a systemic issue! When measurement is objective and when we all think together in the common language of a shared metric (like hours, volts, inches or centimeters, ounces or grams, degrees Fahrenheit or Celsius, etc.), then and only then do we have the means we need to implement lean strategies and create new efficiencies systematically. We need an Intangible Assets Metric System.

The current recession in large part was caused by failures in measuring and managing trust, responsibility, loyalty, and commitment. Similar problems in measuring and managing human, social, and natural capital have led to endlessly spiraling costs in education, health care, social services, and environmental management. The problems we’re experiencing in these areas are intimately tied up with the way we formulate and implement group level decision making processes and policies based in statistics when what we need is to empower individuals with the tools and information they need to make their own decisions and policies. We will not and cannot metamorphose from caterpillar to butterfly until we create the infrastructure through which we each can take full ownership and control of our individual shares of the human, social, and natural capital stock that is rightfully ours.

We well know that we manage what we measure. What counts gets counted. Attention tends to be focused on what we’re accountable for. But–and this is vitally important–many of the numbers called measures do not provide the information we need for management. And not only are lots of numbers giving us low quality information, there are far too many of them! We could have better and more information from far fewer numbers.

Previous postings in this blog document the fact that we have the intellectual, political, scientific, and economic resources we need to measure and manage human, social, and natural capital for authentic wealth. And the issue is not a matter of marshaling the will. It is hard to imagine how there could be more demand for better management of intangible assets than there is right now. The problem in meeting that demand is a matter of imagining how to start the ball rolling. What configuration of investments and resources will start the process of bursting open the chrysalis? How will the demand for meaningful mediating instruments be met in a way that leads to the spreading of the butterfly’s wings? It is an exciting time to be alive.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

Subjectivity, Objectivity, Performance Measurement and Markets

April 23, 2011

Though he attributes his insight to a colleague (George Baker), Michael Jensen has once more succinctly stated a key point I’ve repeatedly tried to convey in my blog posts. As Jensen (2003, p. 397) puts it,

…any activity whose performance can be perfectly measured objectively does not belong inside the firm. If its performance can be adequately measured objectively it can be spun out of the firm and contracted for in a market transaction.

YES!! Though nothing is measured perfectly, my message has been a series of variations on precisely this theme. Well-measured property, services, products, and commodities in today’s economy are associated with scientific, legal and financial structures and processes that endow certain representations with meaningful indications of kind, amount, value and ownership. It is further well established that the ownership of the products of one’s creative endeavors is essential to economic advancement and the enlargement of the greater good. Markets could not exist without objective measures, and thus we have the central commercial importance of metric standards.

The improved measurement of service outcomes and performances is going to create an environment capable of supporting similar legal and financial indications of value and ownership. Many of the causes of today’s economic crises can be traced to poor quality information and inadequate measures of human, social, and natural value. Bringing publicly verifiable scientific data and methods to bear on the tuning of instruments for measuring these forms of value will make their harmonization much simpler than it ever could be otherwise. Social and environmental costs and value have been relegated to the marginal status of externalities because they have not been measured in ways that made it possible to bring them onto the books and into the models.

But the stage is being set for significant changes. Decades of research calibrating objective measures of a wide variety of performances and outcomes are inexorably leading to the creation of an intangible assets metric system (Fisher, 2009a, 2009b, 2011). Meaningful and rigorous individual-level universally available uniform metrics for each significant intangible asset (abilities, health, trustworthiness, etc.) will

(a) make it possible for each of us to take full possession, ownership, and management control of our investments in and returns from these forms of capital,

(b) coordinate the decisions and behaviors of consumers, researchers, and quality improvement specialists to better match supply and demand, and thereby

(c) increase the efficiency of human, social, and natural capital markets, harnessing the profit motive for the removal of wasted human potential, lost community coherence, and destroyed environmental quality.

Jensen’s observation emerges in his analysis of performance measures as one of three factors in defining the incentives and payoffs for a linear compensation plan (the other two being the intercept and the slope of the bonus line relating salary and bonus to the performance measure targets). The two sentences quoted above occur in this broader context, where Jensen (2003, pp. 396-397) states that,

…we must decide how much subjectivity will be involved in each performance measure. In considering this we must recognize that every performance measurement system in a firm must involve an important amount of subjectivity. The reason, as my colleague George Baker has pointed out, is that any activity whose performance can be perfectly measured objectively does not belong inside the firm. If its performance can be adequately measured objectively it can be spun out of the firm and contracted for in a market transaction. Thus, one of the most important jobs of managers, complementing objective measures of performance with managerial subjective evaluation of subtle interdependencies and other factors is exactly what most managers would like to avoid. Indeed, it is this factor along with efficient risk bearing that is at the heart of what gives managers and firms an advantage over markets.

Jensen is here referring implicitly to the point Coase (1990) makes regarding the nature of the firm. A firm can be seen as a specialized market, one in which methods, insights, and systems not generally available elsewhere are employed for competitive advantage. Products are brought to market competitively by being endowed with value not otherwise available. Maximizing that value is essential to the viability of the firm.

Given conflicting incentives and the mixed messages of the balanced scorecard, managers have plenty of opportunities for creatively avoiding the difficult task of maximizing the value of the firm. Jensen (2001) shows that attending to the “managerial subjective evaluation of subtle interdependencies” is made impossibly complex when decisions and behaviors are pulled in different directions by each stakeholder’s particular interests. Other research shows that even traditional capital structures are plagued by the mismeasurement of leverage, distress costs, tax shields, and the speed with which individual firms adjust their capital needs relative to leverage targets (Graham & Leary, 2010). The objective measurement of intangible assets surely seems impossibly complex to those familiar with these problems.

But perhaps the problems associated with measuring traditional capital structures are not so different from those encountered in the domain of intangible assets. In both cases, a particular kind of unjustified self-assurance seems always to attend the mere availability of numeric data. To the unpracticed eye, numbers seem to always behave the same way, no matter if they are rigorous measures of physical commodities, like kilowatts, barrels, or bushels, or if they are currency units in an accounting spreadsheet, or if they are percentages of agreeable responses to a survey question. The problem is that, when interrogated in particular ways with respect to the question of how much of something is supposedly measured, these different kinds of numbers give quite markedly different kinds of answers.

The challenge we face is one of determining what kind of answers we want to the questions we have to ask. Presumably, we want to ask questions and get answers pertinent to obtaining the information we need to manage life creatively, meaningfully, effectively and efficiently. It may be useful then, as a kind of thought experiment, to make a bold leap and imagine a scenario in which relevant questions are answered with integrity, accountability, and transparency.

What will happen when the specialized expertise of human resource professionals is supplanted by a market in which meaningful and comparable measures of the hireability, retainability, productivity, and promotability of every candidate and employee are readily available? If Baker and Jensen have it right, perhaps firms will no longer have employees. This is not to say that no one will work for pay. Instead, firms will contract with individual workers at going market rates, and workers will undoubtedly be well aware of the market value of their available shares of their intangible assets.

A similar consequence follows for the social safety net and a host of other control, regulatory, and policing mechanisms. But we will no longer be stuck with blind faith in the invisible hand and market efficiency, following the faith of those willing to place their trust and their futures in the hands of mechanisms they only vaguely understand and cannot control. Instead, aggregate effects on individuals, communities, and the environment will be tracked in publicly available and critically examined measures, just as stocks, bonds, and commodities are tracked now.

Previous posts in this blog explore the economic possibilities that follow from having empirically substantiated, theoretically predictable, and instrumentally mediated measures embodying broad consensus standards. What we will have for human, social, and natural capital will be the same kind of objective measures that have made markets work as well as they have thus far. It will be a whole new ball game when profits become tied to human, social, and environmental outcomes.

References

Coase, R. (1990). The firm, the market, and the law. Chicago: University of Chicago Press.

Fisher, W. P., Jr. (2009a, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P.. Jr. (2009b). NIST Critical national need idea White Paper: metrological infrastructure for human, social, and natural capital (Tech. Rep. No. http://www.livingcapitalmetrics.com/images/FisherNISTWhitePaper2.pdf). New Orleans: LivingCapitalMetrics.com.

Fisher, W. P., Jr. (2010, 22 November). Meaningfulness, measurement, value seeking, and the corporate objective function: An introduction to new possibilities. Available at http://ssrn.com/abstract=1713467.

Fisher, W. P., Jr. (2011). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 12(1), in press.

Graham, J. R., & Leary, M. T. (2010, 21 December). A review of empirical capital structure research and directions for the future. Available at http://ssrn.com/abstract=1729388.

Jensen, M. C. (2001, Fall). Value maximization, stakeholder theory, and the corporate objective function. Journal of Applied Corporate Finance, 14(3), 8-21.

Jensen, M. C. (2003). Paying people to lie: The truth about the budgeting process. European Financial Management, 9(3), 379-406.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

The Moral Implications of the Concept of Human Capital: More on How to Create Living Capital Markets

March 22, 2011

The moral reprehensibility of the concept of human capital hinges on its use in rationalizing impersonal business decisions in the name of profits. Even when the viability of the organization is at stake, the discarding of people (referred to in some human resource departments as “taking out the trash”) entails degrees of psychological and economic injury no one should have to suffer, or inflict.

There certainly is a justified need for a general concept naming the productive capacity of labor. But labor is far more than a capacity for work. No one’s working life should be reduced to a job description. Labor involves a wide range of different combinations of skills, abilities, motivations, health, and trustworthiness. Human capital has then come to be broken down into a wide variety of forms, such as literacy capital, health capital, social capital, etc.

The metaphoric use of the word “capital” in the phrase “human capital” referring to stocks of available human resources rings hollow. The traditional concept of labor as a form of capital is an unjustified reduction of diverse capacities in itself. But the problem goes deeper. Intangible resources like labor are not represented and managed in the forms that make markets for tangible resources efficient. Transferable representations, like titles and deeds, give property a legal status as owned and an economic status as financially fungible. And in those legal and economic terms, tangible forms of capital give capitalism its hallmark signification as the lifeblood of the cycle of investment, profits, and reinvestment.

Intangible forms of capital, in contrast, are managed without the benefit of any standardized way of proving what is owned, what quantity or quality of it exists, and what it costs. Human, social, and natural forms of capital are therefore managed directly, by acting in an unmediated way on whomever or whatever embodies them. Such management requires, even in capitalist economies, the use of what are inherently socialistic methods, as these are the only methods available for dealing with the concrete individual people, communities, and ecologies involved (Fisher, 2002, 2011; drawing from Hayek, 1948, 1988; De Soto, 2000).

The assumption that transferable representations of intangible assets are inconceivable or inherently reductionist is, however, completely mistaken. All economic capital is ultimately brought to life (conceived, gestated, midwifed, and nurtured to maturity) as scientific capital. Scientific measurability is what makes it possible to add up the value of shares of stock across holdings, to divide something owned into shares, and to represent something in a court or a bank in a portable form (Latour, 1987; Fisher, 2002, 2011).

Only when you appreciate this distinction between dead and living capital, between capital represented on transferable instruments and capital that is not, then you can see that the real tragedy is not in the treatment of labor as capital. No, the real tragedy is in the way everyone is denied the full exercise of their rights over the skills, abilities, health, motivations, trustworthiness, and environmental resources that are rightly their own personal, private property.

Being homogenized at the population level into an interchangeable statistic is tragic enough. But when we leave the matter here, we fail to see and to grasp the meaning of the opportunities that are lost in that myopic world view. As I have been at pains in this blog to show, statistics are not measures. Statistical models of interactions between several variables at the group level are not the same thing as measurement models of interactions within a single variable at the individual level. When statistical models are used in place of measurement models, the result is inevitably numbers without a soul. When measurement models of individual response processes are used to produce meaningful estimates of how much of something someone possesses, a whole different world of possibilities opens up.

In the same way that the Pythagorean Theorem applies to any triangle, so, too, do the coordinates from the international geodetic survey make it possible to know everything that needs to be known about the location and disposition of a piece of real estate. Advanced measurement models in the psychosocial sciences are making it possible to arrive at similarly convenient and objective ways of representing the quality and quantity of intangible assets. Instead of being just one number among many others, real measures tell a story that situates each of us relative to everyone else in a meaningful way.

The practical meaning of the maxim “you manage what you measure” stems from those instances in which measures embody the fullness of the very thing that is the object of management interest. An engine’s fuel efficiency, or the volume of commodities produced, for instance, are things that can be managed less or more efficiently because there are measures of them that directly represent just what we want to control. Lean thinking enables the removal of resources that do not contribute to the production of the desired end result.

Many metrics, however, tend to obscure and distract from what need to be managed. The objects of measurement may seem to be obviously related to what needs to be managed, but dealing with each of them piecemeal results in inefficient and ineffective management. In these instances, instead of the characteristic cycle of investment, profit, and reinvestment, there seems only a bottomless pit absorbing ever more investment and never producing a profit. Why?

The economic dysfunctionality of intangible asset markets is intimately tied up with the moral dysfunctionality of those markets. Drawing an analogy from a recent analysis of political freedom (Shirky, 2010), economic freedom has to be accompanied by a market society economically literate enough, economically empowered enough, and interconnected enough to trade on the capital stocks issued. Western society, and increasingly the entire global society, is arguably economically literate and sufficiently interconnected to exercise economic freedom.

Economic empowerment is another matter entirely. There is no economic power without fungible capital, without ways of representing resources of all kinds, tangible and intangible, that transparently show what is available, how much of it there is, and what quality it is. A form of currency expressing the value of that capital is essential, but money is wildly insufficient to the task of determining the quality and quantity of the available capital stocks.

Today’s education, health care, human resource, and environmental quality markets are the diametric opposite of the markets in which investors, producers, and consumers are empowered. Only when dead human, social, and natural capital is brought to life in efficient markets (Fisher, 2011) will we empower ourselves with fuller degrees of creative control over our economic lives.

The crux of the economic empowerment issue is this: in the current context of inefficient intangibles markets, everyone is personally commodified. Everything that makes me valuable to an employer or investor or customer, my skills, motivations, health, and trustworthiness, is unjustifiably reduced to a homogenized unit of labor. And in the social and environmental quality markets, voting our shares is cumbersome, expensive, and often ineffective because of the immense amount of work that has to be done to defend each particular living manifestation of the value we want to protect.

Concentrated economic power is exercised in the mass markets of dead, socialized intangible assets in ways that we are taught to think of as impersonal and indifferent to each of us as individuals, but which is actually experienced by us as intensely personal.

So what is the difference between being treated personally as a commodity and being treated impersonally as a commodity? This is the same as asking what it would mean to be empowered economically with creative control over the stocks of human, social, and natural capital that are rightfully our private property. This difference is the difference between dead and living capital (Fisher, 2002, 2011).

Freedom of economic communication, realized in the trade of privately owned stocks of any form of capital, ought to be the highest priority in the way we think about the infrastructure of a sustainable and socially responsible economy. For maximum efficiency, that freedom requires a common meaningful and rigorous quantitative language enabling determinations of what exactly is for sale, and its quality, quantity, and unit price. As I have ad nauseum repeated in this blog, measurement based in scientifically calibrated instrumentation traceable to consensus standards is absolutely essential to meeting this need.

Coming in at a very close second to the highest priority is securing the ability to trade. A strong market society, where people can exercise the right to control their own private property—their personal stocks of human, social, and natural capital—in highly efficient markets, is more important than policies, regulations, and five-year plans dictating how masses of supposedly homogenous labor, social, and environmental commodities are priced and managed.

So instead of reacting to the downside of the business cycle with a socialistic safety net, how might a capitalistic one prove more humane, moral, and economically profitable? Instead of guaranteeing a limited amount of unemployment insurance funded through taxes, what we should have are requirements for minimum investments in social capital. Instead of employment in the usual sense of the term, with its implications of hiring and firing, we should have an open market for fungible human capital, in which everyone can track the price of their stock, attract and make new investments, take profits and income, upgrade the quality and/or quantity of their stock, etc.

In this context, instead of receiving unemployment compensation, workers not currently engaged in remunerated use of their skills would cash in some of their accumulated stock of social capital. The cost of social capital would go up in periods of high demand, as during the recent economic downturns caused by betrayals of trust and commitment (which are, in effect, involuntary expenditures of social capital). Conversely, the cost of human capital would also fluctuate with supply and demand, with the profits (currently referred to as wages) turned by individual workers rising and falling with the price of their stocks. These ups and downs, being absorbed by everyone in proportion to their investments, would reduce the distorted proportions we see today in the shares of the rewards and punishments allotted.

Though no one would have a guaranteed wage, everyone would have the opportunity to manage their capital to the fullest, by upgrading it, keeping it current, and selling it to the highest bidder. Ebbing and flowing tides would more truly lift and drop all boats together, with the drops backed up with the social capital markets’ tangible reassurance that we are all in this together. This kind of a social capitalism transforms the supposedly impersonal but actually highly personal indifference of flows in human capital into a more fully impersonal indifference in which individuals have the potential to maximize the realization of their personal goals.

What we need is to create a visible alternative to the bankrupt economic system in a kind of reverse shock doctrine. Eleanor Roosevelt often said that the thing we are most afraid of is the thing we most need to confront if we are to grow. The more we struggle against what we fear, the further we are carried away from what we want. Only when we relax into the binding constraints do we find them loosened. Only when we channel overwhelming force against itself or in a productive direction can we withstand attack. When we find the courage to go where the wild things are and look the monsters in the eye will we have the opportunity to see if their fearful aspect is transformed to playfulness. What is left is often a more mundane set of challenges, the residuals of a developmental transition to a new level of hierarchical complexity.

And this is the case with the moral implications of the concept of human capital. Treating individuals as fungible commodities is a way that some use to protect themselves from feeling like monsters and from being discarded as well. Those who find themselves removed from the satisfactions of working life can blame the shortsightedness of their former colleagues, or the ugliness of the unfeeling system. But neither defensive nor offensive rationalizations do anything to address the actual problem, and the problem has nothing to do with the morality or the immorality of the concept of human capital.

The problem is the problem. That is, the way we approach and define the problem delimits the sphere of the creative options we have for solving it. As Henry Ford is supposed to have said, whether you think you can or you think you cannot, you’re probably right. It is up to us to decide whether we can create an economic system that justifies its reductions and actually lives up to its billing as impersonal and unbiased, or if we cannot. Either way, we’ll have to accept and live with the consequences.

References

DeSoto, H. (2000). The mystery of capital: Why capitalism triumphs in the West and fails everywhere else. New York: Basic Books.

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2011, Spring). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 12(1), in press.

Hayek, F. A. (1948). Individualism and economic order. Chicago: University of Chicago Press.

Hayek, F. A. (1988). The fatal conceit: The errors of socialism (W. W. Bartley, III, Ed.) The Collected Works of F. A. Hayek. Chicago: University of Chicago Press.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York: Cambridge University Press.

Shirky, C. (2010, December 20). The political power of social media: Technology, the public sphere, and political change. Foreign Affairs, 90(1), http://www.foreignaffairs.com/articles/67038/clay-shirky/the-political-power-of-social-media.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

A Second Simple Example of Measurement’s Role in Reducing Transaction Costs, Enhancing Market Efficiency, and Enables the Pricing of Intangible Assets

March 9, 2011

The prior post here showed why we should not confuse counts of things with measures of amounts, though counts are the natural starting place to begin constructing measures. That first simple example focused on an analogy between counting oranges and measuring the weight of oranges, versus counting correct answers on tests and measuring amounts of ability. This second example extends the first by, in effect, showing what happens when we want to aggregate value not just across different counts of some one thing but across different counts of different things. The point will be, in effect, to show how the relative values of apples, oranges, grapes, and bananas can be put into a common frame of reference and compared in a practical and convenient way.

For instance, you may go into a grocery store to buy raspberries and blackberries, and I go in to buy cantaloupe and watermelon. Your cost per individual fruit will be very low, and mine will be very high, but neither of us will find this annoying, confusing, or inconvenient because your fruits are very small, and mine, very large. Conversely, your cost per kilogram will be much higher than mine, but this won’t cause either of us any distress because we both recognize the differences in the labor, handling, nutritional, and culinary value of our purchases.

But what happens when we try to purchase something as complex as a unit of socioeconomic development? The eight UN Millennium Development Goals (MDGs) represent a start at a systematic effort to bring human, social, and natural capital together into the same economic and accountability framework as liquid and manufactured capital, and property. But that effort is stymied by the inefficiency and cost of making and using measures of the goals achieved. The existing MDG databases (http://data.un.org/Browse.aspx?d=MDG), and summary reports present overwhelming numbers of numbers. Individual indicators are presented for each year, each country, each region, and each program, goal by goal, target by target, indicator by indicator, and series by series, in an indigestible volume of data.

Though there are no doubt complex mathematical methods by which a philanthropic, governmental, or NGO investor might determine how much development is gained per million dollars invested, the cost of obtaining impact measures is so high that most funding decisions are made with little information concerning expected returns (Goldberg, 2009). Further, the percentages of various needs met by leading social enterprises typically range from 0.07% to 3.30%, and needs are growing, not diminishing. Progress at current rates means that it would take thousands of years to solve today’s problems of human suffering, social disparity, and environmental quality. The inefficiency of human, social, and natural capital markets is so overwhelming that there is little hope for significant improvements without the introduction of fundamental infrastructural supports, such as an Intangible Assets Metric System.

A basic question that needs to be asked of the MDG system is, how can anyone make any sense out of so much data? Most of the indicators are evaluated in terms of counts of the number of times something happens, the number of people affected, or the number of things observed to be present. These counts are usually then divided by the maximum possible (the count of the total population) and are expressed as percentages or rates.

As previously explained in various posts in this blog, counts and percentages are not measures in any meaningful sense. They are notoriously difficult to interpret, since the quantitative meaning of any given unit difference varies depending on the size of what is counted, or where the percentage falls in the 0-100 continuum. And because counts and percentages are interpreted one at a time, it is very difficult to know if and when any number included in the sheer mass of data is reasonable, all else considered, or if it is inconsistent with other available facts.

A study of the MDG data must focus on these three potential areas of data quality improvement: consistency evaluation, volume reduction, and interpretability. Each builds on the others. With consistent data lending themselves to summarization in sufficient statistics, data volume can be drastically reduced with no loss of information (Andersen, 1977, 1999; Wright, 1977, 1997), data quality can be readily assessed in terms of sufficiency violations (Smith, 2000; Smith & Plackner, 2009), and quantitative measures can be made interpretable in terms of a calibrated ruler’s repeatedly reproducible hierarchy of indicators (Bond & Fox, 2007; Masters, Lokan, & Doig, 1994).

The primary data quality criteria are qualitative relevance and meaningfulness, on the one hand, and mathematical rigor, on the other. The point here is one of following through on the maxim that we manage what we measure, with the goal of measuring in such a way that management is better focused on the program mission and not distracted by accounting irrelevancies.

Method

As written and deployed, each of the MDG indicators has the face and content validity of providing information on each respective substantive area of interest. But, as has been the focus of repeated emphases in this blog, counting something is not the same thing as measuring it.

Counts or rates of literacy or unemployment are not, in and of themselves, measures of development. Their capacity to serve as contributing indications of developmental progress is an empirical question that must be evaluated experimentally against the observable evidence. The measurement of progress toward an overarching developmental goal requires inferences made from a conceptual order of magnitude above and beyond that provided in the individual indicators. The calibration of an instrument for assessing progress toward the realization of the Millennium Development Goals requires, first, a reorganization of the existing data, and then an analysis that tests explicitly the relevant hypotheses as to the potential for quantification, before inferences supporting the comparison of measures can be scientifically supported.

A subset of the MDG data was selected from the MDG database available at http://data.un.org/Browse.aspx?d=MDG, recoded, and analyzed using Winsteps (Linacre, 2011). At least one indicator was selected from each of the eight goals, with 22 in total. All available data from these 22 indicators were recorded for each of 64 countries.

The reorganization of the data is nothing but a way of making the interpretation of the percentages explicit. The meaning of any one country’s percentage or rate of youth unemployment, cell phone users, or literacy has to be kept in context relative to expectations formed from other countries’ experiences. It would be nonsense to interpret any single indicator as good or bad in isolation. Sometimes 30% represents an excellent state of affairs, other times, a terrible one.

Therefore, the distributions of each indicator’s percentages across the 64 countries were divided into ranges and converted to ratings. A lower rating uniformly indicates a status further away from the goal than a higher rating. The ratings were devised by dividing the frequency distribution of each indicator roughly into thirds.

For instance, the youth unemployment rate was found to vary such that the countries furthest from the desired goal had rates of 25% and more(rated 1), and those closest to or exceeding the goal had rates of 0-10% (rated 3), leaving the middle range (10-25%) rated 2. In contrast, percentages of the population that are undernourished were rated 1 for 35% or more, 2 for 15-35%, and 3 for less than 15%.

Thirds of the distributions were decided upon only on the basis of the investigator’s prior experience with data of this kind. A more thorough approach to the data would begin from a finer-grained rating system, like that structuring the MDG table at http://mdgs.un.org/unsd/mdg/Resources/Static/Products/Progress2008/MDG_Report_2008_Progress_Chart_En.pdf. This greater detail would be sought in order to determine empirically just how many distinctions each indicator can support and contribute to the overall measurement system.

Sixty-four of the available 336 data points were selected for their representativeness, with no duplications of values and with a proportionate distribution along the entire continuum of observed values.

Data from the same 64 countries and the same years were then sought for the subsequent indicators. It turned out that the years in which data were available varied across data sets. Data within one or two years of the target year were sometimes substituted for missing data.

The data were analyzed twice, first with each indicator allowed its own rating scale, parameterizing each of the category difficulties separately for each item, and then with the full rating scale model, as the results of the first analysis showed all indicators shared strong consistency in the rating structure.

Results

Data were 65.2% complete. Countries were assessed on an average of 14.3 of the 22 indicators, and each indicator was applied on average to 41.7 of the 64 country cases. Measurement reliability was .89-.90, depending on how measurement error is estimated. Cronbach’s alpha for the by-country scores was .94. Calibration reliability was .93-.95. The rating scale worked well (see Linacre, 2002, for criteria). The data fit the measurement model reasonably well, with satisfactory data consistency, meaning that the hypothesis of a measurable developmental construct was not falsified.

The main result for our purposes here concerns how satisfactory data consistency makes it possible to dramatically reduce data volume and improve data interpretability. The figure below illustrates how. What does it mean for data volume to be drastically reduced with no loss of information? Let’s see exactly how much the data volume is reduced for the ten item data subset shown in the figure below.

The horizontal continuum from -100 to 1300 in the figure is the metric, the ruler or yardstick. The number of countries at various locations along that ruler is shown across the bottom of the figure. The mean (M), first standard deviation (S), and second standard deviation (T) are shown beneath the numbers of countries. There are ten countries with a measure of just below 400, just to the left of the mean (M).

The MDG indicators are listed on the right of the figure, with the indicator most often found being achieved relative to the goals at the bottom, and the indicator least often being achieved at the top. The ratings in the middle of the figure increase from 1 to 3 left to right as the probability of goal achievement increases as the measures go from low to high. The position of the ratings in the middle of the figure shifts from left to right as one reads up the list of indicators because the difficulty of achieving the goals is increasing.

Because the ratings of the 64 countries relative to these ten goals are internally consistent, nothing but the developmental level of the country and the developmental challenge of the indicator affects the probability that a given rating will be attained. It is this relation that defines fit to a measurement model, the sufficiency of the summed ratings, and the interpretability of the scores. Given sufficient fit and consistency, any country’s measure implies a given rating on each of the ten indicators.

For instance, imagine a vertical line drawn through the figure at a measure of 500, just above the mean (M). This measure is interpreted relative to the places at which the vertical line crosses the ratings in each row associated with each of the ten items. A measure of 500 is read as implying, within a given range of error, uncertainty, or confidence, a rating of

  • 3 on debt service and female-to-male parity in literacy,
  • 2 or 3 on how much of the population is undernourished and how many children under five years of age are moderately or severely underweight,
  • 2 on infant mortality, the percent of the population aged 15 to 49 with HIV, and the youth unemployment rate,
  • 1 or 2 the poor’s share of the national income, and
  • 1 on CO2 emissions and the rate of personal computers per 100 inhabitants.

For any one country with a measure of 500 on this scale, ten percentages or rates that appear completely incommensurable and incomparable are found to contribute consistently to a single valued function, developmental goal achievement. Instead of managing each separate indicator as a universe unto itself, this scale makes it possible to manage development itself at its own level of complexity. This ten-to-one ratio of reduced data volume is more than doubled when the total of 22 items included in the scale is taken into account.

This reduction is conceptually and practically important because it focuses attention on the actual object of management, development. When the individual indicators are the focus of attention, the forest is lost for the trees. Those who disparage the validity of the maxim, you manage what you measure, are often discouraged by the the feeling of being pulled in too many directions at once. But a measure of the HIV infection rate is not in itself a measure of anything but the HIV infection rate. Interpreting it in terms of broader developmental goals requires evidence that it in fact takes a place in that larger context.

And once a connection with that larger context is established, the consistency of individual data points remains a matter of interest. As the world turns, the order of things may change, but, more likely, data entry errors, temporary data blips, and other factors will alter data quality. Such changes cannot be detected outside of the context defined by an explicit interpretive framework that requires consistent observations.

-100  100     300     500     700     900    1100    1300
|-------+-------+-------+-------+-------+-------+-------|  NUM   INDCTR
1                                 1  :    2    :  3     3    9  PcsPer100
1                         1   :   2    :   3            3    8  CO2Emissions
1                    1  :    2    :   3                 3   10  PoorShareNatInc
1                 1  :    2    :  3                     3   19  YouthUnempRatMF
1              1   :    2   :   3                       3    1  %HIV15-49
1            1   :   2    :   3                         3    7  InfantMortality
1          1  :    2    :  3                            3    4  ChildrenUnder5ModSevUndWgt
1         1   :    2    :  3                            3   12  PopUndernourished
1    1   :    2   :   3                                 3    6  F2MParityLit
1   :    2    :  3                                      3    5  DebtServExpInc
|-------+-------+-------+-------+-------+-------+-------|  NUM   INDCTR
-100  100     300     500     700     900    1100    1300
                   1
       1   1 13445403312323 41 221    2   1   1            COUNTRIES
       T      S       M      S       T

Discussion

A key element in the results obtained here concerns the fact that the data were about 35% missing. Whether or not any given indicator was actually rated for any given country, the measure can still be interpreted as implying the expected rating. This capacity to take missing data into account can be taken advantage of systematically by calibrating a large bank of indicators. With this in hand, it becomes possible to gather only the amount of data needed to make a specific determination, or to adaptively administer the indicators so as to obtain the lowest-error (most reliable) measure at the lowest cost (with the fewest indicators administered). Perhaps most importantly, different collections of indicators can then be equated to measure in the same unit, so that impacts may be compared more efficiently.

Instead of an international developmental aid market that is so inefficient as to preclude any expectation of measured returns on investment, setting up a calibrated bank of indicators to which all measures are traceable opens up numerous desirable possibilities. The cost of assessing and interpreting the data informing aid transactions could be reduced to negligible amounts, and the management of the processes and outcomes in which that aid is invested would be made much more efficient by reduced data volume and enhanced information content. Because capital would flow more efficiently to where supply is meeting demand, nonproducers would be cut out of the market, and the effectiveness of the aid provided would be multiplied many times over.

The capacity to harmonize counts of different but related events into a single measurement system presents the possibility that there may be a bright future for outcomes-based budgeting in education, health care, human resource management, environmental management, housing, corrections, social services, philanthropy, and international development. It may seem wildly unrealistic to imagine such a thing, but the return on the investment would be so monumental that not checking it out would be even crazier.

A full report on the MDG data, with the other references cited, is available on my SSRN page at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1739386.

Goldberg, S. H. (2009). Billions of drops in millions of buckets: Why philanthropy doesn’t advance social progress. New York: Wiley.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

A Simple Example of How Better Measurement Creates New Market Efficiencies, Reduces Transaction Costs, and Enables the Pricing of Intangible Assets

March 4, 2011

One of the ironies of life is that we often overlook the obvious in favor of the obscure. And so one hears of huge resources poured into finding and capitalizing on opportunities that provide infinitesimally small returns, while other opportunities—with equally certain odds of success but far more profitable returns—are completely neglected.

The National Institute for Standards and Technology (NIST) reports returns on investment ranging from 32% to over 400% in 32 metrological improvements made in semiconductors, construction, automation, computers, materials, manufacturing, chemicals, photonics, communications and pharmaceuticals (NIST, 2009). Previous posts in this blog offer more information on the economic value of metrology. The point is that the returns obtained from improvements in the measurement of tangible assets will likely also be achieved in the measurement of intangible assets.

How? With a little bit of imagination, each stage in the development of increasingly meaningful, efficient, and useful measures described in this previous post can be seen as implying a significant return on investment. As those returns are sought, investors will coordinate and align different technologies and resources relative to a roadmap of how these stages are likely to unfold in the future, as described in this previous post. The basic concepts of how efficient and meaningful measurement reduces transaction costs and market frictions, and how it brings capital to life, are explained and documented in my publications (Fisher, 2002-2011), but what would a concrete example of the new value created look like?

The examples I have in mind hinge on the difference between counting and measuring. Counting is a natural and obvious thing to do when we need some indication of how much of something there is. But counting is not measuring (Cooper & Humphry, 2010; Wright, 1989, 1992, 1993, 1999). This is not some minor academic distinction of no practical use or consequence. It is rather the source of the vast majority of the problems we have in comparing outcome and performance measures.

Imagine how things would be if we couldn’t weigh fruit in a grocery store, and all we could do was count pieces. We can tell when eight small oranges possess less overall mass of fruit than four large ones by weighing them; the eight small oranges might weigh .75 kilograms (about 1.6 pounds) while the four large ones come in at 1.0 kilo (2.2 pounds). If oranges were sold by count instead of weight, perceptive traders would buy small oranges and make more money selling them than they could if they bought large ones.

But we can’t currently arrive so easily at the comparisons we need when we’re buying and selling intangible assets, like those produced as the outcomes of educational, health care, or other services. So I want to walk through a couple of very down-to-earth examples to bring the point home. Today we’ll focus on the simplest version of the story, and tomorrow we’ll take up a little more complicated version, dealing with the counts, percentages, and scores used in balanced scorecard and dashboard metrics of various kinds.

What if you score eight on one reading test and I score four on a different reading test? Who has more reading ability? In the same way that we might be able to tell just by looking that eight small oranges are likely to have less actual orange fruit than four big ones, we might also be able to tell just by looking that eight easy (short, common) words can likely be read correctly with less reading ability than four difficult (long, rare) words can be.

So let’s analyze the difference between buying oranges and buying reading ability. We’ll set up three scenarios for buying reading ability. In all three, we’ll imagine we’re comparing how we buy oranges with the way we would have to go about buying reading ability today if teachers were paid for the gains made on the tests they administer at the beginning and end of the school year.

In the first scenario, the teachers make up their own tests. In the second, the teachers each use a different standardized test. In the third, each teacher uses a computer program that draws questions from the same online bank of precalibrated items to construct a unique test custom tailored to each student. Reading ability scenario one is likely the most commonly found in real life. Scenario three is the rarest, but nonetheless describes a situation that has been available to millions of students in the U.S., Australia, and elsewhere for several years. Scenarios one, two and three correspond with developmental levels one, three, and five described in a previous blog entry.

Buying Oranges

When you go into one grocery store and I go into another, we don’t have any oranges with us. When we leave, I have eight and you have four. I have twice as many oranges as you, but yours weigh a kilo, about a third more than mine (.75 kilos).

When we paid for the oranges, the transaction was finished in a few seconds. Neither one of us experienced any confusion, annoyance, or inconvenience in relation to the quality of information we had on the amount of orange fruits we were buying. I did not, however, pay twice as much as you did. In fact, you paid more for yours than I did for mine, in direct proportion to the difference in the measured amounts.

No negotiations were necessary to consummate the transactions, and there was no need for special inquiries about how much orange we were buying. We knew from experience in this and other stores that the prices we paid were comparable with those offered in other times and places. Our information was cheap, as it was printed on the bag of oranges or could be read off a scale, and it was very high quality, as the measures were directly comparable with measures from any other scale in any other store. So, in buying oranges, the impact of information quality on the overall cost of the transaction was so inexpensive as to be negligible.

Buying Reading Ability (Scenario 1)

So now you and I go through third grade as eight year olds. You’re in one school and I’m in another. We have different teachers. Each teacher makes up his or her own reading tests. When we started the school year, we each took a reading test (different ones), and we took another (again, different ones) as we ended the school year.

For each test, your teacher counted up your correct answers and divided by the total number of questions; so did mine. You got 72% correct on the first one, and 94% correct on the last one. I got 83% correct on the first one, and 86% correct on the last one. Your score went up 22%, much more than the 3% mine went up. But did you learn more? It is impossible to tell. What if both of your tests were easier—not just for you or for me but for everyone—than both of mine? What if my second test was a lot harder than my first one? On the other hand, what if your tests were harder than mine? Perhaps you did even better than your scores seem to indicate.

We’ll just exclude from consideration other factors that might come to bear, such as whether your tests were significantly longer or shorter than mine, or if one of us ran out of time and did not answer a lot of questions.

If our parents had to pay the reading teacher at the end of the school year for the gains that were made, how would they tell what they were getting for their money? What if your teacher gave a hard test at the start of the year and an easy one at the end of the year so that you’d have a big gain and your parents would have to pay more? What if my teacher gave an easy test at the start of the year and a hard one at the end, so that a really high price could be put on very small gains? If our parents were to compare their experiences in buying our improved reading ability, they would have a lot of questions about how much improvement was actually obtained. They would be confused and annoyed at how inconvenient the scores are, because they are difficult, if not impossible, to compare. A lot of time and effort might be invested in examining the words and sentences in each of the four reading tests to try to determine how easy or hard they are in relation to each other. Or, more likely, everyone would throw their hands up and pay as little as they possibly can for outcomes they don’t understand.

Buying Reading Ability (Scenario 2)

In this scenario, we are third graders again, in different schools with different reading teachers. Now, instead of our teachers making up their own tests, our reading abilities are measured at the beginning and the end of the school year using two different standardized tests sold by competing testing companies. You’re in a private suburban school that’s part of an independent schools association. I’m in a public school along with dozens of others in an urban school district.

For each test, our parents received a report in the mail showing our scores. As before, we know how many questions we each answered correctly, and, unlike before, we don’t know which particular questions we got right or wrong. Finally, we don’t know how easy or hard your tests were relative to mine, but we know that the two tests you took were equated, and so were the two I took. That means your tests will show how much reading ability you gained, and so will mine.

We have one new bit of information we didn’t have before, and that’s a percentile score. Now we know that at the beginning of the year, with a percentile ranking of 72, you performed better than 72% of the other private school third graders taking this test, and at the end of the year you performed better than 76% of them. In contrast, I had percentiles of 84 and 89.

The question we have to ask now is if our parents are going to pay for the percentile gain, or for the actual gain in reading ability. You and I each learned more than our peers did on average, since our percentile scores went up, but this would not work out as a satisfactory way to pay teachers. Averages being averages, if you and I learned more and faster, someone else learned less and slower, so that, in the end, it all balances out. Are we to have teachers paying parents when their children learn less, simply redistributing money in a zero sum game?

And so, additional individualized reports are sent to our parents by the testing companies. Your tests are equated with each other, and they measure in a comparable unit that ranges from 120 to 480. You had a starting score of 235 and finished the year with a score of 420, for a gain of 185.

The tests I took are comparable and measure in the same unit, too, but not the same unit as your tests measure in. Scores on my tests range from 400 to 1200. I started the year with a score of 790, and finished at 1080, for a gain of 290.

Now the confusion in the first scenario is overcome, in part. Our parents can see that we each made real gains in reading ability. The difficulty levels of the two tests you took are the same, as are the difficulties of the two tests I took. But our parents still don’t know what to pay the teacher because they can’t tell if you or I learned more. You had lower percentiles and test scores than I did, but you are being compared with what is likely a higher scoring group of suburban and higher socioeconomic status students than the urban group of disadvantaged students I’m compared against. And your scores aren’t comparable with mine, so you might have started and finished with more reading ability than I did, or maybe I had more than you. There isn’t enough information here to tell.

So, again, the information that is provided is insufficient to the task of settling on a reasonable price for the outcomes obtained. Our parents will again be annoyed and confused by the low quality information that makes it impossible to know what to pay the teacher.

Buying Reading Ability (Scenario 3)

In the third scenario, we are still third graders in different schools with different reading teachers. This time our reading abilities are measured by tests that are completely unique. Every student has a test custom tailored to their particular ability. Unlike the tests in the first and second scenarios, however, now all of the tests have been constructed carefully on the basis of extensive data analysis and experimental tests. Different testing companies are providing the service, but they have gone to the trouble to work together to create consensus standards defining the unit of measurement for any and all reading test items.

For each test, our parents received a report in the mail showing our measures. As before, we know how many questions we each answered correctly. Now, though we don’t know which particular questions we got right or wrong, we can see typical items ordered by difficulty lined up in a way that shows us what kind of items we got wrong, and which kind we got right. And now we also know your tests were equated relative to mine, so we can compare how much reading ability you gained relative to how much I gained. Now our parents can confidently determine how much they should pay the teacher, at least in proportion to their children’s relative measures. If our measured gains are equal, the same payment can be made. If one of us obtained more value, then proportionately more should be paid.

In this third scenario, we have a situation directly analogous to buying oranges. You have a measured amount of increased reading ability that is expressed in the same unit as my gain in reading ability, just as the weights of the oranges are comparable. Further, your test items were not identical with mine, and so the difficulties of the items we took surely differed, just as the sizes of the oranges we bought did.

This third scenario could be made yet more efficient by removing the need for creating and maintaining a calibrated item bank, as described by Stenner and Stone (2003) and in the sixth developmental level in a prior blog post here. Also, additional efficiencies could be gained by unifying the interpretation of the reading ability measures, so that progress through high school can be tracked with respect to the reading demands of adult life (Williamson, 2008).

Comparison of the Purchasing Experiences

In contrast with the grocery store experience, paying for increased reading ability in the first scenario is fraught with low quality information that greatly increases the cost of the transactions. The information is of such low quality that, of course, hardly anyone bothers to go to the trouble to try to decipher it. Too much cost is associated with the effort to make it worthwhile. So, no one knows how much gain in reading ability is obtained, or what a unit gain might cost.

When a school district or educational researchers mount studies to try to find out what it costs to improve reading ability in third graders in some standardized unit, they find so much unexplained variation in the costs that they, too, raise more questions than answers.

In grocery stores and other markets, we don’t place the cost of making the value comparison on the consumer or the merchant. Instead, society as a whole picks up the cost by funding the creation and maintenance of consensus standard metrics. Until we take up the task of doing the same thing for intangible assets, we cannot expect human, social, and natural capital markets to obtain the efficiencies we take for granted in markets for tangible assets and property.

References

Cooper, G., & Humphry, S. M. (2010). The ontological distinction between units and entities. Synthese, pp. DOI 10.1007/s11229-010-9832-1.

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2003). Measurement and communities of inquiry. Rasch Measurement Transactions, 17(3), 936-8 [http://www.rasch.org/rmt/rmt173.pdf].

Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.

Fisher, W. P., Jr. (2005). Daredevil barnstorming to the tipping point: New aspirations for the human sciences. Journal of Applied Measurement, 6(3), 173-9 [http://www.livingcapitalmetrics.com/images/FisherJAM05.pdf].

Fisher, W. P., Jr. (2007, Summer). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-3 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2009a, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P.. Jr. (2009b). NIST Critical national need idea White Paper: Metrological infrastructure for human, social, and natural capital (Tech. Rep., http://www.livingcapitalmetrics.com/images/FisherNISTWhitePaper2.pdf). New Orleans: LivingCapitalMetrics.com.

Fisher, W. P., Jr. (2011). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 12(1), in press.

NIST. (2009, 20 July). Outputs and outcomes of NIST laboratory research. Available: http://www.nist.gov/director/planning/studies.cfm (Accessed 1 March 2011).

Stenner, A. J., & Stone, M. (2003). Item specification vs. item banking. Rasch Measurement Transactions, 17(3), 929-30 [http://www.rasch.org/rmt/rmt173a.htm].

Williamson, G. L. (2008). A text readability continuum for postsecondary readiness. Journal of Advanced Academics, 19(4), 602-632.

Wright, B. D. (1989). Rasch model from counting right answers: Raw scores as sufficient statistics. Rasch Measurement Transactions, 3(2), 62 [http://www.rasch.org/rmt/rmt32e.htm].

Wright, B. D. (1992, Summer). Scores are not measures. Rasch Measurement Transactions, 6(1), 208 [http://www.rasch.org/rmt/rmt61n.htm].

Wright, B. D. (1993). Thinking with raw scores. Rasch Measurement Transactions, 7(2), 299-300 [http://www.rasch.org/rmt/rmt72r.htm].

Wright, B. D. (1999). Common sense for measurement. Rasch Measurement Transactions, 13(3), 704-5  [http://www.rasch.org/rmt/rmt133h.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

 

One of the ironies of life is that we often overlook the obvious in favor of the obscure. And so one hears of huge resources poured into finding and capitalizing on opportunities that provide infinitesimally small returns, while other opportunities—with equally certain odds of success but far more profitable returns—are completely neglected.

The National Institute for Standards and Technology (NIST) reports returns on investment ranging from 32% to over 400% in 32 metrological improvements made in semiconductors, construction, automation, computers, materials, manufacturing, chemicals, photonics, communications and pharmaceuticals (NIST, 2009). Previous posts in this blog offer more information on the economic value of metrology. The point is that the returns obtained from improvements in the measurement of tangible assets will likely also be achieved in the measurement of intangible assets.

How? With a little bit of imagination, each stage in the development of increasingly meaningful, efficient, and useful measures described in this previous post can be seen as implying a significant return on investment. As those returns are sought, investors will coordinate and align different technologies and resources relative to a roadmap of how these stages are likely to unfold in the future, as described in this previous post. But what would a concrete example of the new value created look like?

The examples I have in mind hinge on the difference between counting and measuring. Counting is a natural and obvious thing to do when we need some indication of how much of something there is. But counting is not measuring (Cooper & Humphry, 2010; Wright, 1989, 1992, 1993, 1999). This is not some minor academic distinction of no practical use or consequence. It is rather the source of the vast majority of the problems we have in comparing outcome and performance measures.

Imagine how things would be if we couldn’t weigh fruit in a grocery store, and all we could do was count pieces. We can tell when eight small oranges possess less overall mass of fruit than four large ones by weighing them; the eight small oranges might weigh .75 kilograms (about 1.6 pounds) while the four large ones come in at 1.0 kilo (2.2 pounds). If oranges were sold by count instead of weight, perceptive traders would buy small oranges and make more money selling them than they could if they bought large ones.

But we can’t currently arrive so easily at the comparisons we need when we’re buying and selling intangible assets, like those produced as the outcomes of educational, health care, or other services. So I want to walk through a couple of very down-to-earth examples to bring the point home. Today we’ll focus on the simplest version of the story, and tomorrow we’ll take up a little more complicated version, dealing with the counts, percentages, and scores used in balanced scorecard and dashboard metrics of various kinds.

What if you score eight on one reading test and I score four on a different reading test? Who has more reading ability? In the same way that we might be able to tell just by looking that eight small oranges are likely to have less actual orange fruit than four big ones, we might also be able to tell just by looking that eight easy (short, common) words can likely be read correctly with less reading ability than four difficult (long, rare) words can be.

So let’s analyze the difference between buying oranges and buying reading ability. We’ll set up three scenarios for buying reading ability. In all three, we’ll imagine we’re comparing how we buy oranges with the way we would have to go about buying reading ability today if teachers were paid for the gains made on the tests they administer at the beginning and end of the school year.

In the first scenario, the teachers make up their own tests. In the second, the teachers each use a different standardized test. In the third, each teacher uses a computer program that draws questions from the same online bank of precalibrated items to construct a unique test custom tailored to each student. Reading ability scenario one is likely the most commonly found in real life. Scenario three is the rarest, but nonetheless describes a situation that has been available to millions of students in the U.S., Australia, and elsewhere for several years. Scenarios one, two and three correspond with developmental levels one, three, and five described in a previous blog entry.

Buying Oranges

When you go into one grocery store and I go into another, we don’t have any oranges with us. When we leave, I have eight and you have four. I have twice as many oranges as you, but yours weigh a kilo, about a third more than mine (.75 kilos).

When we paid for the oranges, the transaction was finished in a few seconds. Neither one of us experienced any confusion, annoyance, or inconvenience in relation to the quality of information we had on the amount of orange fruits we were buying. I did not, however, pay twice as much as you did. In fact, you paid more for yours than I did for mine, in direct proportion to the difference in the measured amounts.

No negotiations were necessary to consummate the transactions, and there was no need for special inquiries about how much orange we were buying. We knew from experience in this and other stores that the prices we paid were comparable with those offered in other times and places. Our information was cheap, as it was printed on the bag of oranges or could be read off a scale, and it was very high quality, as the measures were directly comparable with measures from any other scale in any other store. So, in buying oranges, the impact of information quality on the overall cost of the transaction was so inexpensive as to be negligible.

Buying Reading Ability (Scenario 1)

So now you and I go through third grade as eight year olds. You’re in one school and I’m in another. We have different teachers. Each teacher makes up his or her own reading tests. When we started the school year, we each took a reading test (different ones), and we took another (again, different ones) as we ended the school year.

For each test, your teacher counted up your correct answers and divided by the total number of questions; so did mine. You got 72% correct on the first one, and 94% correct on the last one. I got 83% correct on the first one, and 86% correct on the last one. Your score went up 22%, much more than the 3% mine went up. But did you learn more? It is impossible to tell. What if both of your tests were easier—not just for you or for me but for everyone—than both of mine? What if my second test was a lot harder than my first one? On the other hand, what if your tests were harder than mine? Perhaps you did even better than your scores seem to indicate.

We’ll just exclude from consideration other factors that might come to bear, such as whether your tests were significantly longer or shorter than mine, or if one of us ran out of time and did not answer a lot of questions.

If our parents had to pay the reading teacher at the end of the school year for the gains that were made, how would they tell what they were getting for their money? What if your teacher gave a hard test at the start of the year and an easy one at the end of the year so that you’d have a big gain and your parents would have to pay more? What if my teacher gave an easy test at the start of the year and a hard one at the end, so that a really high price could be put on very small gains? If our parents were to compare their experiences in buying our improved reading ability, they would have a lot of questions about how much improvement was actually obtained. They would be confused and annoyed at how inconvenient the scores are, because they are difficult, if not impossible, to compare. A lot of time and effort might be invested in examining the words and sentences in each of the four reading tests to try to determine how easy or hard they are in relation to each other. Or, more likely, everyone would throw their hands up and pay as little as they possibly can for outcomes they don’t understand.

Buying Reading Ability (Scenario 2)

In this scenario, we are third graders again, in different schools with different reading teachers. Now, instead of our teachers making up their own tests, our reading abilities are measured at the beginning and the end of the school year using two different standardized tests sold by competing testing companies. You’re in a private suburban school that’s part of an independent schools association. I’m in a public school along with dozens of others in an urban school district.

For each test, our parents received a report in the mail showing our scores. As before, we know how many questions we each answered correctly, and, as before, we don’t know which particular questions we got right or wrong. Finally, we don’t know how easy or hard your tests were relative to mine, but we know that the two tests you took were equated, and so were the two I took. That means your tests will show how much reading ability you gained, and so will mine.

But we have one new bit of information we didn’t have before, and that’s a percentile score. Now we know that at the beginning of the year, with a percentile ranking of 72, you performed better than 72% of the other private school third graders taking this test, and at the end of the year you performed better than 76% of them. In contrast, I had percentiles of 84 and 89.

The question we have to ask now is if our parents are going to pay for the percentile gain, or for the actual gain in reading ability. You and I each learned more than our peers did on average, since our percentile scores went up, but this would not work out as a satisfactory way to pay teachers. Averages being averages, if you and I learned more and faster, someone else learned less and slower, so that, in the end, it all balances out. Are we to have teachers paying parents when their children learn less, simply redistributing money in a zero sum game?

And so, additional individualized reports are sent to our parents by the testing companies. Your tests are equated with each other, so they measure in a comparable unit that ranges from 120 to 480. You had a starting score of 235 and finished the year with a score of 420, for a gain of 185.

The tests I took are comparable and measure in the same unit, too, but not the same unit as your tests measure in. Scores on my tests range from 400 to 1200. I started the year with a score of 790, and finished at 1080, for a gain of 290.

Now the confusion in the first scenario is overcome, in part. Our parents can see that we each made real gains in reading ability. The difficulty levels of the two tests you took are the same, as are the difficulties of the two tests I took. But our parents still don’t know what to pay the teacher because they can’t tell if you or I learned more. You had lower percentiles and test scores than I did, but you are being compared with what is likely a higher scoring group of suburban and higher socioeconomic status students than the urban group of disadvantaged students I’m compared against. And your scores aren’t comparable with mine, so you might have started and finished with more reading ability than I did, or maybe I had more than you. There isn’t enough information here to tell.

So, again, the information that is provided is insufficient to the task of settling on a reasonable price for the outcomes obtained. Our parents will again be annoyed and confused by the low quality information that makes it impossible to know what to pay the teacher.

Buying Reading Ability (Scenario 3)

In the third scenario, we are still third graders in different schools with different reading teachers. This time our reading abilities are measured by tests that are completely unique. Every student has a test custom tailored to their particular ability. Unlike the tests in the first and second scenarios, however, now all of the tests have been constructed carefully on the basis of extensive data analysis and experimental tests. Different testing companies are providing the service, but they have gone to the trouble to work together to create consensus standards defining the unit of measurement for any and all reading test items.

For each test, our parents received a report in the mail showing our measures. As before, we know how many questions we each answered correctly. Now, though we don’t know which particular questions we got right or wrong, we can see typical items ordered by difficulty lined up in a way that shows us what kind of items we got wrong, and which kind we got right. And now we also know your tests were equated relative to mine, so we can compare how much reading ability you gained relative to how much I gained. Now our parents can confidently determine how much they should pay the teacher, at least in proportion to their children’s relative measures. If our measured gains are equal, the same payment can be made. If one of us obtained more value, then proportionately more should be paid.

In this third scenario, we have a situation directly analogous to buying oranges. You have a measured amount of increased reading ability that is expressed in the same unit as my gain in reading ability, just as the weights of the oranges are comparable. Further, your test items were not identical with mine, and so the difficulties of the items we took surely differed, just as the sizes of the oranges we bought did.

This third scenario could be made yet more efficient by removing the need for creating and maintaining a calibrated item bank, as described by Stenner and Stone (2003) and in the sixth developmental level in a prior blog post here. Also, additional efficiencies could be gained by unifying the interpretation of the reading ability measures, so that progress through high school can be tracked with respect to the reading demands of adult life (Williamson, 2008).

Comparison of the Purchasing Experiences

In contrast with the grocery store experience, paying for increased reading ability in the first scenario is fraught with low quality information that greatly increases the cost of the transactions. The information is of such low quality that, of course, hardly anyone bothers to go to the trouble to try to decipher it. Too much cost is associated with the effort to make it worthwhile. So, no one knows how much gain in reading ability is obtained, or what a unit gain might cost.

When a school district or educational researchers mount studies to try to find out what it costs to improve reading ability in third graders in some standardized unit, they find so much unexplained variation in the costs that they, too, raise more questions than answers.

But we don’t place the cost of making the value comparison on the consumer or the merchant in the grocery store. Instead, society as a whole picks up the cost by funding the creation and maintenance of consensus standard metrics. Until we take up the task of doing the same thing for intangible assets, we cannot expect human, social, and natural capital markets to obtain the efficiencies we take for granted in markets for tangible assets and property.

References

Cooper, G., & Humphry, S. M. (2010). The ontological distinction between units and entities. Synthese, pp. DOI 10.1007/s11229-010-9832-1.

NIST. (2009, 20 July). Outputs and outcomes of NIST laboratory research. Available: http://www.nist.gov/director/planning/studies.cfm (Accessed 1 March 2011).

Stenner, A. J., & Stone, M. (2003). Item specification vs. item banking. Rasch Measurement Transactions, 17(3), 929-30 [http://www.rasch.org/rmt/rmt173a.htm].

Williamson, G. L. (2008). A text readability continuum for postsecondary readiness. Journal of Advanced Academics, 19(4), 602-632.

Wright, B. D. (1989). Rasch model from counting right answers: Raw scores as sufficient statistics. Rasch Measurement Transactions, 3(2), 62 [http://www.rasch.org/rmt/rmt32e.htm].

Wright, B. D. (1992, Summer). Scores are not measures. Rasch Measurement Transactions, 6(1), 208 [http://www.rasch.org/rmt/rmt61n.htm].

Wright, B. D. (1993). Thinking with raw scores. Rasch Measurement Transactions, 7(2), 299-300 [http://www.rasch.org/rmt/rmt72r.htm].

Wright, B. D. (1999). Common sense for measurement. Rasch Measurement Transactions, 13(3), 704-5  [http://www.rasch.org/rmt/rmt133h.htm].

Measurement, Metrology, and the Birth of Self-Organizing, Complex Adaptive Systems

February 28, 2011

On page 145 of his book, The Mathematics of Measurement: A Critical History, John Roche quotes Charles de La Condamine (1701-1774), who, in 1747, wrote:

‘It is quite evident that the diversity of weights and measures of different countries, and frequently in the same province, are a source of embarrassment in commerce, in the study of physics, in history, and even in politics itself; the unknown names of foreign measures, the laziness or difficulty in relating them to our own give rise to confusion in our ideas and leave us in ignorance of facts which could be useful to us.’

Roche (1998, p. 145) then explains what de La Condamine is driving at, saying:

“For reasons of international communication and of civic justice, for reasons of stability over time and for accuracy and reliability, the creation of exact, reproducible and well maintained international standards, especially of length and mass, became an increasing concern of the natural philosophers of the seventeenth and eighteenth centuries. This movement, cooperating with a corresponding impulse in governing circles for the reform of weights and measures for the benefit of society and trade, culminated in late eighteenth century France in the metric system. It established not only an exact, rational and international system of measuring length, area, volume and mass, but introduced a similar standard for temperature within the scientific community. It stimulated a wider concern within science to establish all scientific units with equal rigour, basing them wherever possible on the newly established metric units (and on the older exact units of time and angular measurement), because of their accuracy, stability and international availability. This process gradually brought about a profound change in the notation and interpretation of the mathematical formalism of physics: it brought about, for the first time in the history of the mathematical sciences, a true union of mathematics and measurement.”

As it was in the seventeenth and eighteenth centuries for physics, so it has also been in the twentieth and twenty-first for the psychosocial sciences. The creation of exact, reproducible and well maintained international standards is a matter of increasing concern today for the roles they will play in education, health care, the work place, business intelligence, and the economy at large.

As the economic crises persist and perhaps worsen, demand for common product definitions and for interpretable, meaningful measures of impacts and outcomes in education, health care, social services, environmental management, etc. will reach a crescendo. We need an exact, rational and international system of measuring literacy, numeracy, health, motivations, quality of life, community cohesion, and environmental quality, and we needed it fifty years ago. We need to reinvigorate and revive a wider concern across the sciences to establish all scientific units with equal rigor, and to have all measures used in research and practice based wherever possible on consensus standard metrics valued for their accuracy, stability and availability. We need to replicate in the psychosocial sciences the profound change in the notation and interpretation of the mathematical formalism of physics that occurred in the eighteenth and nineteenth centuries. We need to extend the true union of mathematics and measurement from physics to the psychosocial sciences.

Previous posts in this blog speak to the persistent invariance and objectivity exhibited by many of the constructs measured using ability tests, attitude surveys, performance assessments, etc. A question previously raised in this blog concerning the reproductive logic of living meaning deserves more attention, and can be productively explored in terms of complex adaptive functionality.

In a hierarchy of reasons why mathematically rigorous measurement is valuable, few are closer to the top of the list than facilitating the spontaneous self-organization of networks of agents and actors (Latour, 1987). The conception, gestation, birthing, and nurturing of complex adaptive systems constitute a reproductive logic for sociocultural traditions. Scientific traditions, in particular, form mature self-identities via a mutually implied subject-object relation absorbed into the flow of a dialectical give and take, just as economic systems do.

Complex adaptive systems establish the reproductive viability of their offspring and the coherence of an ecological web of meaningful relationships by means of this dialectic. Taylor (2003, pp. 166-8) describes the five moments in the formation and operation of complex adaptive systems, which must be able

  • to identify regularities and patterns in the flow of matter, energy, and information (MEI) in the environment (business, social, economic, natural, etc.);
  • to produce condensed schematic representations of these regularities so they can be identified as the same if they are repeated;
  • to form reproductively interchangeable variants of these representations;
  • to succeed reproductively by means of the accuracy and reliability of the representations’ predictions of regularities in the MEI data flow; and
  • adaptively modify and reorganize representations by means of informational feedback from the environment.

All living systems, from bacteria and viruses to plants and animals to languages and cultures, are complex adaptive systems characterized by these five features.

In the history of science, technologically-embodied measurement facilitates complex adaptive systems of various kinds. That history can be used as a basis for a meta-theoretical perspective on what measurement must look like in the social and human sciences. Each of Taylor’s five moments in the formation and operation of complex adaptive systems describes a capacity of measurement systems, in that:

  • data flow regularities are captured in initial, provisional instrument calibrations;
  • condensed local schematic representations are formed when an instrument’s calibrations are anchored at repeatedly observed, invariant values;
  • interchangeable nonlocal versions of these invariances are created by means of instrument equating, item banking, metrological networks, and selective, tailored, adaptive instrument administration;
  • measures read off inaccurate and unreliable instruments will not support successful reproduction of the data flow regularity, but accurate and reliable instruments calibrated in a shared common unit provide a reference standard metric that enhances communication and reproduces the common voice and shared identity of the research community; and
  • consistently inconsistent anomalous observations provide feedback suggesting new possibilities for as yet unrecognized data flow regularities that might be captured in new calibrations.

Measurement in the social sciences is in the process of extending this functionality into practical applications in business, education, health care, government, and elsewhere. Over the course of the last 50 years, measurement research and practice has already iterated many times through these five moments. In the coming years, a new critical mass will be reached in this process, systematically bringing about scale-of-magnitude improvements in the efficiency of intangible assets markets.

How? What does a “data flow regularity” look like? How is it condensed into a a schematic and used to calibrate an instrument? How are local schematics combined together in a pattern used to recognize new instances of themselves? More specifically, how might enterprise resource planning (ERP) software (such as SAP, Oracle, or PeopleSoft) simultaneously provide both the structure needed to support meaningful comparisons and the flexibility needed for good fit with the dynamic complexity of adaptive and generative self-organizing systems?

Prior work in this area proposes a dual-core, loosely coupled organization using ERP software to build social and intellectual capital, instead of using it as an IT solution addressing organizational inefficiencies (Lengnick-Hall, Lengnick-Hall, & Abdinnour-Helm, 2004). The adaptive and generative functionality (Stenner & Stone, 2003) provided by probabilistic measurement models (Rasch, 1960; Andrich, 2002, 2004; Bond & Fox, 2007; Wilson, 2005; Wright, 1977, 1999) makes it possible to model intra- and inter-organizational interoperability (Weichhart, Feiner, & Stary, 2010) at the same time that social and intellectual capital resources are augmented.

Actor/agent network theory has emerged from social and historical studies of the shared and competing moral, economic, political, and mathematical values disseminated by scientists and technicians in a variety of different successful and failed areas of research (Latour, 2005). The resulting sociohistorical descriptions ought be translated into a practical program for reproducing successful research programs. A metasystem for complex adaptive systems of research is implied in what Roche (1998) calls a “true union of mathematics and measurement.”

Complex adaptive systems are effectively constituted of such a union, even if, in nature, the mathematical character of the data flows and calibrations remains virtual. Probabilistic conjoint models for fundamental measurement are poised to extend this functionality into the human sciences. Though few, if any, have framed the situation in these terms, these and other questions are being explored, explicitly and implicitly, by hundreds of researchers in dozens of fields as they employ unidimensional models for measurement in their investigations.

If so, might then we be on the verge of a yet another new reading and writing of Galileo’s “book of nature,” this time restoring the “loss of meaning for life” suffered in Galileo’s “fateful omission” of the means by which nature came to be understood mathematically (Husserl, 1970)? The elements of a comprehensive, mathematical, and experimental design science of living systems appear on the verge of providing a saturated solution—or better, a nonequilbrium thermodynamic solution—to some of the infamous shortcomings of modern, Enlightenment science. The unity of science may yet be a reality, though not via the reductionist program envisioned by the positivists.

Some 50 years ago, Marshall McLuhan popularized the expression, “The medium is the message.” The special value quantitative measurement in the history of science does not stem from the mere use of number. Instruments are media on which nature, human or other, inscribes legible messages. A renewal of the true union of mathematics and measurement in the context of intangible assets will lead to a new cultural, scientific, and economic renaissance. As Thomas Kuhn (1977, p. 221) wrote,

“The full and intimate quantification of any science is a consummation devoutly to be wished. Nevertheless, it is not a consummation that can effectively be sought by measuring. As in individual development, so in the scientific group, maturity comes most surely to those who know how to wait.”

Given that we have strong indications of how full and intimate quantification consummates a true union of mathematics and measurement, the time for waiting is now past, and the time to act has come. See prior blog posts here for suggestions on an Intangible Assets Metric System, for resources on methods and research, for other philosophical ruminations, and more. This post is based on work presented at Rasch meetings several years ago (Fisher, 2006a, 2006b).

References

Andrich, D. (2002). Understanding resistance to the data-model relationship in Rasch’s paradigm: A reflection for the next generation. Journal of Applied Measurement, 3(3), 325-59.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Fisher, W. P., Jr. (2006a, Friday, April 28). Complex adaptive functionality via measurement. Presented at the Midwest Objective Measurement Seminar, M. Lunz (Organizer), University of Illinois at Chicago.

Fisher, W. P., Jr. (2006b, June 27-9). Measurement and complex adaptive functionality. Presented at the Pacific Rim Objective Measurement Symposium, T. Bond & M. Wu (Organizers), The Hong Kong Institute of Education, Hong Kong.

Husserl, E. (1970). The crisis of European sciences and transcendental phenomenology: An introduction to phenomenological philosophy (D. Carr, Trans.). Evanston, Illinois: Northwestern University Press (Original work published 1954).

Kuhn, T. S. (1977). The function of measurement in modern physical science. In T. S. Kuhn, The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago: University of Chicago Press. [(Reprinted from Kuhn, T. S. (1961). Isis, 52(168), 161-193.]

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York: Cambridge University Press.

Latour, B. (2005). Reassembling the social: An introduction to actor-network-theory. (Clarendon Lectures in Management Studies). Oxford, England: Oxford University Press.

Lengnick-Hall, C. A., Lengnick-Hall, M. L., & Abdinnour-Helm, S. (2004). The role of social and intellectual capital in achieving competitive advantage through enterprise resource planning (ERP) systems. Journal of Engineering Technology Management, 21, 307-330.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roche, J. (1998). The mathematics of measurement: A critical history. London: The Athlone Press.

Stenner, A. J., & Stone, M. (2003). Item specification vs. item banking. Rasch Measurement Transactions, 17(3), 929-30 [http://www.rasch.org/rmt/rmt173a.htm].

Taylor, M. C. (2003). The moment of complexity: Emerging network culture. Chicago: University of Chicago Press.

Weichhart, G., Feiner, T., & Stary, C. (2010). Implementing organisational interoperability–The SUddEN approach. Computers in Industry, 61, 152-160.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

A Technology Road Map for Efficient Intangible Assets Markets

February 24, 2011

Scientific technologies, instruments and conceptual images have been found to play vitally important roles in economic success because of the way they enable accurate predictions of future industry and market states (Miller & O’Leary, 2007). The technology road map for the microprocessor industry, based in Moore’s Law, has successfully guided market expectations and coordinated research investment decisions for over 40 years. When the earlier electromechanical, relay, vacuum tube, and transistor computing technology paradigms are included, the same trajectory has dominated the computer industry for over 100 years (Kurzweil, 2005, pp. 66-67).

We need a similar technology road map to guide the creation and development of intangible asset markets for human, social, and natural (HSN) capital. This will involve intensive research on what the primary constructs are, determining what is measurable and what is not, creating consensus standards for uniform metrics and the metrology networks through which those standards will function. Alignments with these developments will require comprehensively integrated economic models, accounting frameworks, and investment platforms, in addition to specific applications deploying the capital formations.

What I’m proposing is, in a sense, just an extension in a new direction of the metrology challenges and issues summarized in Table ITWG15 on page 48 in the 2010 update to the International Technology Roadmap for Semiconductors (http://www.itrs.net/about.html). Distributed electronic communication facilitated by computers and the Internet is well on the way to creating a globally uniform instantaneous information network. But much of what needs to be communicated through this network remains expressed in locally defined languages that lack common points of reference. Meaningful connectivity demands a shared language.

To those who say we already have the technology necessary and sufficient to the measurement and management of human, social, and natural capital, I say think again. The difference between what we have and what we need is the same as the difference between (a) an economy whose capital resources are not represented in transferable representations like titles and deeds, and that are denominated in a flood of money circulating in different currencies, and, (b) an economy whose capital resources are represented in transferable documents and are traded using a single currency with a restricted money supply. The measurement of intangible assets is today akin to the former economy, with little actual living capital and hundreds of incommensurable instruments and scoring systems, when what we need is the latter. (See previous entries in this blog for more on the difference between dead and living capital.)

Given the model of a road map detailing the significant features of the living capital terrain, industry-specific variations will inform the development of explicit market expectations, the alignment of HSN capital budgeting decisions, and the coordination of research investments. The concept of a technology road map for HSN capital is based in and expands on an integration of hierarchical complexity (Commons & Richards, 2002; Dawson, 2004), complex adaptive functionality (Taylor, 2003), Peirce’s semiotic developmental map of creative thought (Wright, 1999), and historical stages in the development of measuring systems (Stenner & Horabin, 1992; Stenner, Burdick, Sanford, & Burdick, 2006).

Technology road maps replace organizational amnesia with organizational learning by providing the structure of a memory that not only stores information, knowledge, understanding, and wisdom, but makes it available for use in new situations. Othman and Hashim (2004) describe organizational amnesia (OA) relative to organizational learning (OL) in a way that opens the door to a rich application of Miller and O’Leary’s (2007) detailed account of how technology road maps contribute to the creation of new markets and industries. Technology road maps function as the higher organizational principles needed for transforming individual and social expertise into economically useful products and services. Organizational learning and adaptability further need to be framed at the inter-organizational level where their various dimensions or facets are aligned not only within individual organizations but between them within the industry as a whole.

The mediation of the individual and organizational levels, and of the organizational and inter-organizational levels, is facilitated by measurement. In the microprocessor industry, Moore’s Law enabled the creation of technology road maps charting the structure, processes, and outcomes that had to be aligned at the individual, organizational, and inter-organizational levels to coordinate the entire microprocessor industry’s economic success. Such road maps need to be created for each major form of human, social, and natural capital, with the associated alignments and coordinations put in play at all levels of every firm, industry, and government.

It is a basic fact of contemporary life that the technologies we employ every day are so complex that hardly anyone understands how they do what they do. Technological miracles are commonplace events, from transportation to entertainment, from health care to manufacturing. And we usually suffer little in the way of adverse consequences from not knowing how an automatic transmission, a thermometer, or digital video reproduction works. It is enough to know how to use the tool.

This passive acceptance of technical details beyond our ken extends into areas in which standards, methods, and products are much less well defined. Managers, executives, researchers, teachers, clinicians, and others who need measurement but who are unaware of its technicalities are then put in the position of being passive consumers accepting the lowest common denominator in the quality of the services and products obtained.

And that’s not all. Just as the mass market of measurement consumers is typically passive and uninformed, in complementary fashion the supply side is fragmented and contentious. There is little agreement among measurement experts as to which quantitative methods set the standard as the state of the art. Virtually any method can be justified in terms of some body of research and practice, so the confused consumer accepts whatever is easily available or is most likely to support a preconceived agenda.

It may be possible, however, to separate the measurement wheat from the chaff. For instance, measurement consumers may value a way of distinguishing among methods that is based in a simple criterion of meaningful utility. What if all measurement consumers’ own interests in, and reasons for, measuring something in particular, such as literacy or community, were emphasized and embodied in a common framework? What if a path of small steps from currently popular methods of less value to more scientific ones of more value could be mapped? Such a continuum of methods could range from those doing the least to advance the users’ business interests to those doing the most to advance those interests.

The aesthetics, simplicity, meaningfulness, rigor, and practical consequences of strong theoretical requirements for instrument calibration provide such criteria for choices as to models and methods (Andrich, 2002, 2004; Busemeyer and Wang, 2000; Myung, 2000; Pitt, Kim, Myung, 2003; Wright, 1997, 1999). These criteria could be used to develop and guide explicit considerations of data quality, construct theory, instrument calibration, quantitative comparisons, measurement standard metrics, etc. along a continuum from the most passive and least objective to the most actively involved and most objective.

The passive approach to measurement typically starts from and prioritizes content validity. The questions asked on tests, surveys, and assessments are considered relevant primarily on the basis of the words they use and the concepts they appear to address. Evidence that the questions actually cohere together and measure the same thing is not needed. If there is any awareness of the existence of axiomatically prescribed measurement requirements, these are not considered to be essential. That is, if failures of invariance are observed, they usually provoke a turn to less stringent data treatments instead of a push to remove or prevent them. Little or no measurement or construct theory is implemented, meaning that all results remain dependent on local samples of items and people. Passively approaching measurement in this way is then encumbered by the need for repeated data gathering and analysis, and by the local dependency of the results. Researchers working in this mode are akin to the woodcutters who say they are too busy cutting trees to sharpen their saws.

An alternative, active approach to measurement starts from and prioritizes construct validity and the satisfaction of the axiomatic measurement requirements. Failures of invariance provoke further questioning, and there is significant practical use of measurement and construct theory. Results are then independent of local samples, sometimes to the point that researchers and practical applications are not encumbered with usual test- or survey-based data gathering and analysis.

As is often the case, this black and white portrayal tells far from the whole story. There are multiple shades of grey in the contrast between passive and active approaches to measurement. The actual range of implementations is much more diverse that the simple binary contrast would suggest (see the previous post in this blog for a description of a hierarchy of increasingly complex stages in measurement). Spelling out the variation that exists could be helpful for making deliberate, conscious choices and decisions in measurement practice.

It is inevitable that we would start from the materials we have at hand, and that we would then move through a hierarchy of increasing efficiency and predictive control as understanding of any given variable grows. Previous considerations of the problem have offered different categorizations for the transformations characterizing development on this continuum. Stenner and Horabin (1992) distinguish between 1) impressionistic and qualitative, nominal gradations found in the earliest conceptualizations of temperature, 2) local, data-based quantitative measures of temperature, and 3) generalized, universally uniform, theory-based quantitative measures of temperature.

The latter is prized for the way that thermodynamic theory enables the calibration of individual thermometers with no need for testing each one in empirical studies of its performance. Theory makes it possible to know in advance what the results of such tests would be with enough precision to greatly reduce the burden and expenses of instrument calibration.

Reflecting on the history of psychosocial measurement in this context, it then becomes apparent that these three stages can then be further broken down. The previous post in this blog lists the distinguishing features for each of six stages in the evolution of measurement systems, building on the five stages described by Stenner, Burdick, Sanford, and Burdick (2006).

And so what analogue of Moore’s Law might be projected? What kind of timetable can be projected for the unfolding of what might be called Stenner’s Law? Guidance for reasonable expectations is found in Kurzweil’s (2005) charting of historical and projected future exponential increases in the volume of information and computer processing speed. The accelerating growth in knowledge taking place in the world today speaks directly to a systematic integration of criteria for what shall count as meaningful new learning. Maps of the roads we’re traveling will provide some needed guidance and make the trip more enjoyable, efficient, and productive. Perhaps somewhere not far down the road we’ll be able to project doubling rates for growth in the volume of fungible literacy capital globally, or the halving rates in the cost of health capital stocks. We manage what we measure, so when we begin measuring well what we want to manage well, we’ll all be better off.

References

Andrich, D. (2002). Understanding resistance to the data-model relationship in Rasch’s paradigm: A reflection for the next generation. Journal of Applied Measurement, 3(3), 325-59.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Busemeyer, J. R., & Wang, Y.-M. (2000, March). Model comparisons and model selections based on generalization criterion methodology. Journal of Mathematical Psychology, 44(1), 171-189 [http://quantrm2.psy.ohio-state.edu/injae/jmpsp.htm].

Commons, M. L., & Richards, F. A. (2002, Jul). Organizing components into combinations: How stage transition works. Journal of Adult Development, 9(3), 159-177.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

Kurzweil, R. (2005). The singularity is near: When humans transcend biology. New York: Viking Penguin.

Miller, P., & O’Leary, T. (2007, October/November). Mediating instruments and making markets: Capital budgeting, science and the economy. Accounting, Organizations, and Society, 32(7-8), 701-34.

Myung, I. J. (2000). Importance of complexity in model selection. Journal of Mathematical Psychology, 44(1), 190-204.

Othman, R., & Hashim, N. A. (2004). Typologizing organizational amnesia. The Learning Organization, 11(3), 273-84.

Pitt, M. A., Kim, W., & Myung, I. J. (2003). Flexibility versus generalizability in model selection. Psychonomic Bulletin & Review, 10, 29-44.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Stenner, A. J., & Horabin, I. (1992). Three stages of construct definition. Rasch Measurement Transactions, 6(3), 229 [http://www.rasch.org/rmt/rmt63b.htm].

Taylor, M. C. (2003). The moment of complexity: Emerging network culture. Chicago: University of Chicago Press.

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Stages in the Development of Meaningful, Efficient, and Useful Measures

February 21, 2011

In all learning, we use what we already know as a means of identifying what we do not yet know. When someone can read a written language, knows an alphabet and has a vocabulary, understands grammar and syntax, then that knowledge can be used to learn about the world. Then, knowing what birds are, for instance, one might learn about different kinds of birds or the typical behaviors of one bird species.

And so with measurement, we start from where we find ourselves, as with anything else. There is no need or possibility for everyone to master all the technical details of every different area of life that’s important. But it is essential that we know what is technically possible, so that we can seek out and find the tools that help us achieve our goals. We can’t get what we can’t or don’t ask for. In the domain of measurement, it seems that hardly anyone is looking for what’s actually readily available.

So it seems pertinent to offer a description of a continuum of increasingly meaningful, efficient and useful ways of measuring. Previous considerations of the problem have offered different categorizations for the transformations characterizing development on this continuum. Stenner and Horabin (1992) distinguish between 1) impressionistic and qualitative, nominal gradations found in the earliest conceptualizations of temperature, 2) local, data-based quantitative measures of temperature, and 3) generalized, universally uniform, theory-based quantitative measures of temperature.

Theory-based temperature measurement is prized for the way that thermodynamic theory enables the calibration of individual thermometers with no need for testing each one in empirical studies of its performance. As Lewin (1951, p. 169) put it, “There is nothing so practical as a good theory.” Thus we have electromagnetic theory making it possible to know the conduction and resistance characteristics of electrical cable from the properties of the metal alloys and insulators used, with no need to test more than a small fraction of that cable as a quality check.

Theory makes it possible to know in advance what the results of such tests would be with enough precision to greatly reduce the burden and expenses of instrument calibration. There likely would be no electrical industry at all if the properties of every centimeter of cable and every appliance had to be experimentally tested. This principle has been employed in measuring human, social, and natural capital for some time, but, for a variety of reasons, it has not yet been adopted on a wide scale.

Reflecting on the history of psychosocial measurement in this context, it then becomes apparent that Stenner and Horabin’s (1992) three stages can then be further broken down. Listed below are the distinguishing features for each of six stages in the evolution of measurement systems, building on the five stages described by Stenner, Burdick, Sanford, and Burdick (2006). This progression of increasing complexity, meaning, efficiency, and utility can be used as a basis for a technology roadmap that will enable the coordination and alignment of various services and products in the domain of intangible assets, as I will take up in a forthcoming post.

Stage 1. Least meaning, utility, efficiency, and value

Purely passive, receptive

Statistics describe data: What you see is what you get

Content defines measure

Additivity, invariance, etc. not tested, so numbers do not stand for something that adds up like they do

Measurement defined statistically in terms of group-level intervariable relations

Meaning of numbers changes with questions asked and persons answering

No theory

Data must be gathered and analyzed to have results

Commercial applications are instrument-dependent

Standards based in ensuring fair methods and processes

Stage 2

Slightly less passive, receptive but still descriptively oriented

Additivity, invariance, etc. tested, so numbers might stand for something that adds up like they do

Measurement still defined statistically in terms of group-level intervariable relations

Falsification of additive hypothesis effectively derails measurement effort

Descriptive models with interaction effects accepted as viable alternatives

Typically little or no attention to theory of item hierarchy and construct definition

Empirical (data-based) calibrations only

Data must be gathered and analyzed to have results

Initial awareness of measurement theory

Commercial applications are instrument-dependent

Standards based in ensuring fair methods and processes

Stage 3

Even less purely passive & receptive, more active

Instrument still designed relative to content specifications

Additivity, invariance, etc. tested, so numbers might stand for something that adds up like they do

Falsification of additive hypothesis provokes questions as to why

Descriptive models with interaction effects not accepted as viable alternatives

Measurement defined prescriptively in terms of individual-level intravariable invariance

Significant attention to theory of item hierarchy and construct definition

Empirical calibrations only

Data has to be gathered and analyzed to have results

More significant use of measurement theory in prescribing acceptable data quality

Limited construct theory (no predictive power)

Commercial applications are instrument-dependent

Standards based in ensuring fair methods and processes

Stage 4

First stage that is more active than passive

Initial efforts to (re-)design instrument relative to construct specifications and theory

Additivity, invariance, etc. tested in thoroughly prescriptive focus on calibrating instrument

Numbers not accepted unless they stand for something that adds up like they do

Falsification of additive hypothesis provokes questions as to why and corrective action

Models with interaction effects not accepted as viable alternatives

Measurement defined prescriptively in terms of individual-level intravariable invariance

Significant attention to theory of item hierarchy and construct definition relative to instrument design

Empirical calibrations only but model prescribes data quality

Data usually has to be gathered and analyzed to have results

Point of use self-scoring forms might provide immediate measurement results to end user

Some construct theory (limited predictive power)

Some commercial applications are not instrument-dependent (as in CAT item bank implementations)

Standards based in ensuring fair methods and processes

Stage 5

Significantly active approach to measurement

Item hierarchy translated into construct theory

Construct specification equation predicts item difficulties

Theory-predicted (not empirical) calibrations used in applications

Item banks superseded by single-use items created on the fly

Calibrations checked against empirical results but data gathering and analysis not necessary

Point of use self-scoring forms or computer apps provide immediate measurement results to end user

Used routinely in commercial applications

Awareness that standards might be based in metrological traceability to consensus standard uniform metric

Stage 6. Most meaning, utility, efficiency, and value

Most purely active approach to measurement

Item hierarchy translated into construct theory

Construct specification equation predicts item ensemble difficulties

Theory-predicted calibrations enable single-use items created from context

Checked against empirical results for quality assessment but data gathering and analysis not necessary

Point of use self-scoring forms or computer apps provide immediate measurement results to end user

Used routinely in commercial applications

Standards based in metrological traceability to consensus standard uniform metric

 

References

Lewin, K. (1951). Field theory in social science: Selected theoretical papers (D. Cartwright, Ed.). New York: Harper & Row.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Stenner, A. J., & Horabin, I. (1992). Three stages of construct definition. Rasch Measurement Transactions, 6(3), 229 [http://www.rasch.org/rmt/rmt63b.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Build it and they will come

February 8, 2011

“It” in the popular Kevin Costner movie, “Field of Dreams,” was a baseball diamond. He put it in a corn field. Not only did a ghost team conjure itself from the corn, so did a line of headlights on the road. There would seem to have been a stunning lack of preparation for crowds of fans, as parking, food, and toilet facilities were nowhere in sight.

Those things would be taken care of in due course, but that’s another story. The point has nothing to do with being realistic and everything to do with making dreams come true. Believing in yourself and your dreams is hard. Dreams are inherently unrealistic. As George Bernard Shaw said, reasonable people adapt to life and the world. It’s unreasonable people who think the world should adapt to them. And, accordingly, change comes about only because unreasonable and unrealistic people act to make things different.

I dream of a playing field, too. I can’t just go clear a few acres in a field to build it, though. The kind of clearing I’m dreaming of is more abstract. But the same idea applies. I, too, am certain that, if we build it, they will come.

What is it? Who are they? “It” is a better way for each of us to represent who we are to the world, and to see where we stand in it. It is a new language for speaking the truth of what we are each capable of. It is a way of tuning the instruments of a new science that will enable us to harmonize relationships of all kinds: personal, occupational, social, and economic.

Which brings us to who “they” are. They are us. Humanity. We are the players on this field that we will clear. We are the ones who care and who desire meaning. We are the ones who have been robbed of the trust, loyalty, and commitment we’ve invested in governments, corporations, and decades of failed institutions. We are the ones who know what has been lost, and what yet could still be gained. We are the ones who possess our individual skills, motivations, and health, but yet have no easy, transparent way to represent how much of any one of them we have, what quality it is, or how much it can be traded for. We are the ones who all share in the bounty of the earth’s fecund capacity for self-renewal, but who among us can show exactly how much the work we do every day adds or subtracts from the quality of the environment?

So why do I say, build it and they will come? Because this sort of thing is not something that can be created piecemeal. What if Costner’s character in the movie had not just built the field but had instead tried to find venture capital, recruit his dream team, set up a ticket sales vendor, hire management and staff, order uniforms and equipment, etc.? It never would have happened. It doesn’t work that way.

And so, finally, just what do we need to build? Just this: a new metric system. The task is to construct a system of measures for managing what’s most important in life: our relationships, our health, our capacity for productive and creative employment. We need a system that enables us to track our investments in intangible assets like education, health care, community, and quality of life. We need instruments tuned to the same scales, ones that take advantage of recently developed technical capacities for qualitatively meaningful quantification; for information synthesis across indicators/items/questions; for networked, collective thinking; for adaptive innovation support; and for creating fungible currencies in which human, social, and natural capital can be traded in efficient markets.

But this is not a system that can be built piecemeal. Infrastructure on this scale is too complex and too costly for any single individual, firm, or industry to create by itself. And building one part of it at a time will not work. We need to create the environment in which these new forms of life, these new species, these new markets for living capital, can take root and grow, organically. If we create that environment, with incentives and rewards capable of functioning like fertile soil, warm sun, and replenishing rain, it will be impossible to stop the growth.

You see, there are thousands of people around the world using new measurement methods to calibrate tests, surveys and assessments as valid and reliable instruments. But they are operating in an environment in which the fully viable seeds they have to plant are wasted. There’s no place for them to take root. There’s no sun, no water.

Why is the environment for the meaningful, uniform measurement of intangible assets so inhospitable? The primary answer to this question is cultural. We have ingrained and highly counterproductive attitudes toward what are often supposed to be the inherent properties of numbers. One very important attitude of this kind is that it is common to think that all numbers are quantitative. But lots of scoring systems and percentage reporting schemes involve numbers that do not stand for something that adds up. There is nothing automatic or simple about the way any given unit of calibrated measurement remains the same all up and down a scale. Arriving at a way to construct and maintain such a unit requires as much intensive research and imaginative investigation in the social sciences as it does in the natural sciences. But where the natural sciences and engineering have grown up around a focus on meaningful measurement, the social sciences have not.

One result of mistaken preconceptions about number is that even when tests, surveys, and assessments measure the same thing, they are disconnected from one another, tuned to different scales. There is no natural environment, no shared ecology, in which the growth of learning can take place in field-wide terms. There’s no common language in which to share what’s been learned. Even when research results are exactly the same, they look different.

But if there was a system of consensus-based reference standard metrics, one for each major construct–reading, writing, and math abilities; health status; physical and psychosocial functioning; quality of life; social and natural capital–there would be the expectation that instruments measuring the same thing should measure in the same unit. Researchers could be contributing to building larger systems when they calibrate new instruments and recalibrate old ones. They would more obviously be adding to the stock of human knowledge, understanding, and wisdom. Divergent results would demand explanations, and convergent ones would give us more confidence as we move forward.

Most importantly, quality improvement and consumer purchasing decisions and behaviors would be fluidly coordinated with no need for communicating and negotiating the details of each individual comparison. Education and health care lack common product definitions because their outcomes are measured in fragmented, incommensurable metrics. But if we had consensus-based reference standard metrics for every major form of capital employed in the economy, we could develop reasonable expectations expressed in a common language for how much change should typically be obtained in fifth-grade mathematics or from a hip replacement.

As is well-known in the business world, innovation is highly dependent on standards. We cannot empower the front line with the authority to make changes when decisions have to be based on information that is unavailable or impossible to interpret. Most of the previous entries in this blog take up various aspects of this situation.

All of this demands a very different way of thinking about what’s possible in the realm of measurement. The issues are complex. They are usually presented in difficult mathematical terms within specialized research reports. But the biggest problem has to do with thinking laterally, with moving ideas out of the vertical hierarchies of the silos where they are trapped and into a new field we can dream in. And the first seeds to be planted in such a field are the ones that say the dream is worth dreaming. When we hear that message, we are already on the way not just to building this dream, but to creating a world in which everyone can dream and envision more specific possibilities for their lives, their families, their creativity.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

You see, there are thousands of people around the world using these
new measurement methods to calibrate tests, surveys and assessments as
valid and reliable instruments. But they are operating in an
environment in which the fully viable seeds they have to plant are
wasted. There’s no place for them to take root. There’s no sun, no
water. 

This is because the instruments being calibrated are all disconnected.
Even instruments of the same kind measuring the same thing are
isolated from one another, tuned to different scales. There is no
natural environment, no shared ecology, in which the growth of
learning can take place. There’s no common language in which to share
what’s been learned. Even when results are exactly the same, they look
different.

 

You see, there are thousands of people around the world using these new measurement methods to calibrate tests, surveys and assessments as valid and reliable instruments. But they are operating in an environment in which the fully viable seeds they have to plant are wasted. There’s no place for them to take root. There’s no sun, no water. This is because the instruments being calibrated are all disconnected. Even instruments of the same kind measuring the same thing are isolated from one another, tuned to different scales. There is no natural environment, no shared ecology, in which the growth of learning can take place. There’s no common language in which to share what’s been learned. Even when results are exactly the same, they look different.

Consequences of Standardized Technical Effects for Scientific Advancement

January 24, 2011

Note. This is modified from:

Fisher, W. P., Jr. (2004, Wednesday, January 21). Consequences of standardized technical effects for scientific advancement. In  A. Leplège (Chair), Session 2.5A. Rasch Models: History and Philosophy. Second International Conference on Measurement in Health, Education, Psychology, and Marketing: Developments with Rasch Models, The International Laboratory for Measurement in the Social Sciences, School of Education, Murdoch University, Perth, Western Australia.

—————————

Over the last several decades, historians of science have repeatedly produced evidence contradicting the widespread assumption that technology is a product of experimentation and/or theory (Kuhn 1961; Latour 1987; Rabkin 1992; Schaffer 1992; Hankins & Silverman 1999; Baird 2002). Theory and experiment typically advance only within the constraints set by a key technology that is widely available to end users in applied and/or research contexts. Thus, “it is not just a clever historical aphorism, but a general truth, that ‘thermodynamics owes much more to the steam engine than ever the steam engine owed to thermodynamics’” (Price 1986, p. 240).

The prior existence of the relevant technology comes to bear on theory and experiment again in the common, but mistaken, assumption that measures are made and experimentally compared in order to discover scientific laws. History and the logic of measurement show that measures are rarely made until the relevant law is effectively embodied in an instrument (Kuhn 1961; Michell 1999). This points to the difficulty experienced in metrologically fusing (Schaffer 1992, p. 27; Lapré & van Wassenhove 2002) instrumentalists’ often inarticulate, but materially effective, knowledge (know-how) with theoreticians’ often immaterial, but well articulated, knowledge (know-why) (Galison 1999; Baird 2002).

Because technology often dictates what, if any, phenomena can be consistently produced, it constrains experimentation and theorizing by focusing attention selectively on reproducible, potentially interpretable effects, even when those effects are not well understood (Ackermann 1985; Daston & Galison 1992; Ihde 1998; Hankins & Silverman 1999; Maasen & Weingart 2001). Criteria for theory choice in this context stem from competing explanatory frameworks’ experimental capacities to facilitate instrument improvements, prediction of experimental results, and gains in the efficiency with which a phenomenon is produced.

In this context, the relatively recent introduction of measurement models requiring additive, invariant parameterizations (Rasch 1960) provokes speculation as to the effect on the human sciences that might be wrought by the widespread availability of consistently reproducible effects expressed in common quantitative languages. Paraphrasing Price’s comment on steam engines and thermodynamics, might it one day be said that as yet unforeseeable advances in reading theory will owe far more to the Lexile analyzer (Burdick & Stenner 1996) than ever the Lexile analyzer owed reading theory?

Kuhn (1961) speculated that the second scientific revolution of the mid-nineteenth century followed in large part from the full mathematization of physics, i.e., the emergence of metrology as a professional discipline focused on providing universally accessible uniform units of measurement (Roche 1998). Might a similar revolution and new advances in the human sciences follow from the introduction of rigorously mathematical uniform measures?

Measurement technologies capable of supporting the calibration of additive units that remain invariant over instruments and samples (Rasch 1960) have been introduced relatively recently in the human sciences. The invariances produced appear 1) very similar to those produced in the natural sciences (Fisher 1997) and 2) based in the same mathematical metaphysics as that informing the natural sciences (Fisher 2003). Might then it be possible that the human sciences are on the cusp of a revolution analogous to that of nineteenth century physics? Other factors involved in answering this question, such as the professional status of the field, the enculturation of students, and the scale of the relevant enterprises, define the structure of circumstances that might be capable of supporting the kind of theoretical consensus and research productivity that came to characterize, for instance, work in electrical resistance through the early 1880s (Schaffer 1992).

Much could be learned from Rasch’s use of Maxwell’s method of analogy (Nersessian, 2002; Turner, 1955), not just in the modeling of scientific laws but from the social and economic factors that made the regularities of natural phenomena function as scientific capital (Latour, 1987). Quantification must be understood in the fully mathematical sense of commanding a comprehensive grasp of the real root of mathematical thinking. Far from being simply a means of producing numbers, to be useful, quantification has to result in qualitatively transparent figure-meaning relations at any point of use for any one of every different kind of user. Connections between numbers and unit amounts of the variable must remain constant across samples, instruments, time, space, and measurers. Quantification that does not support invariant linear comparisons expressed in a uniform metric available universally to all end users at the point of need is inadequate and incomplete. Such standardization is widely respected in the natural sciences but is virtually unknown in the human sciences, largely due to untested hypotheses and unexamined prejudices concerning the viability of universal uniform measures for the variables measured via tests, surveys, and performance assessments.

Quantity is an effective medium for science to the extent that it comprises an instance of the kind of common language necessary for distributed, collective thinking; for widespread agreement on what makes research results compelling; and for the formation of social capital’s group-level effects. It may be that the primary relevant difference between the case of 19th century physics and today’s human sciences concerns the awareness, widespread among scientists in the 1800s and virtually nonexistent in today’s human sciences, that universal uniform metrics for the variables of interest are both feasible and of great human, scientific, and economic value.

In the creative dynamics of scientific instrument making, as in the making of art, the combination of inspiration and perspiration can sometimes result in cultural gifts of the first order. It nonetheless often happens that some of these superlative gifts, no matter how well executed, are unable to negotiate the conflict between commodity and gift economics characteristic of the marketplace (Baird, 1997; Hagstrom, 1965; Hyde, 1979), and so remain unknown, lost to the audiences they deserve, and unable to render their potential effects historically. Value is not an intrinsic characteristic of the gift; rather, value is ascribed as a function of interests. If interests are not cultivated via the clear definition of positive opportunities for self-advancement, common languages, socio-economic relations, and recruitment, gifts of even the greatest potential value may die with their creators. On the other hand, who has not seen mediocrity disproportionately rewarded merely as a result of intensive marketing?

A central problem is then how to strike a balance between individual or group interests and the public good. Society and individuals are interdependent in that children are enculturated into the specific forms of linguistic and behavioral competence that are valued in communities at the same time that those communities are created, maintained, and reproduced through communicative actions (Habermas, 1995, pp. 199-200). The identities of individuals and societies then co-evolve, as each defines itself through the other via the medium of language. Language is understood broadly in this context to include all perceptual reading of the environment, bodily gestures, social action, etc., as well as the use of spoken or written symbols and signs (Harman, 2005; Heelan, 1983; Ihde, 1998; Nicholson, 1984; Ricoeur, 1981).

Technologies extend language by providing media for the inscription of new kinds of signs (Heelan, 1983a, 1998; Ihde, 1991, 1998; Ihde & Selinger, 2003). Thus, mobility desires and practices are inscribed and projected into the world using the automobile; shelter and life style, via housing and clothing; and communications, via alphabets, scripts, phonemes, pens and paper, telephones, and computers. Similarly, technologies in the form of test, survey, and assessment instruments provide the devices on which we inscribe desires for social mobility, career advancement, health maintenance and improvement, etc.

References

Ackermann, J. R. (1985). Data, instruments, and theory: A dialectical approach to understanding science. Princeton, New Jersey: Princeton University Press.

Baird, D. (1997, Spring-Summer). Scientific instrument making, epistemology, and the conflict between gift and commodity economics. Techné: Journal of the Society for Philosophy and Technology, 2(3-4), 25-46. Retrieved 08/28/2009, from http://scholar.lib.vt.edu/ejournals/SPT/v2n3n4/baird.html.

Baird, D. (2002, Winter). Thing knowledge – function and truth. Techné: Journal of the Society for Philosophy and Technology, 6(2). Retrieved 19/08/2003, from http://scholar.lib.vt.edu/ejournals/SPT/v6n2/baird.html.

Burdick, H., & Stenner, A. J. (1996). Theoretical prediction of test items. Rasch Measurement Transactions, 10(1), 475 [http://www.rasch.org/rmt/rmt101b.htm].

Daston, L., & Galison, P. (1992, Fall). The image of objectivity. Representations, 40, 81-128.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York, New York: Routledge.

Habermas, J. (1995). Moral consciousness and communicative action. Cambridge, Massachusetts: MIT Press.

Hagstrom, W. O. (1965). Gift-giving as an organizing principle in science. The Scientific Community. New York: Basic Books, pp. 12-22. (Rpt. in B. Barnes, (Ed.). (1972). Sociology of science: Selected readings (pp. 105-20). Baltimore, Maryland: Penguin Books.

Hankins, T. L., & Silverman, R. J. (1999). Instruments and the imagination. Princeton, New Jersey: Princeton University Press.

Harman, G. (2005). Guerrilla metaphysics: Phenomenology and the carpentry of things. Chicago: Open Court.

Hyde, L. (1979). The gift: Imagination and the erotic life of property. New York: Vintage Books.

Ihde, D. (1998). Expanding hermeneutics: Visualism in science. Northwestern University Studies in Phenomenology and Existential Philosophy). Evanston, Illinois: Northwestern University Press.

Kuhn, T. S. (1961). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. in The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago, Illinois: University of Chicago Press (Original work published 1977).

Lapré, M. A., & Van Wassenhove, L. N. (2002, October). Learning across lines: The secret to more efficient factories. Harvard Business Review, 80(10), 107-11.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York, New York: Cambridge University Press.

Maasen, S., & Weingart, P. (2001). Metaphors and the dynamics of knowledge. (Vol. 26. Routledge Studies in Social and Political Thought). London: Routledge.

Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept. Cambridge: Cambridge University Press.

Nersessian, N. J. (2002). Maxwell and “the Method of Physical Analogy”: Model-based reasoning, generic abstraction, and conceptual change. In D. Malament (Ed.), Essays in the history and philosophy of science and mathematics (pp. 129-166). Lasalle, Illinois: Open Court.

Price, D. J. d. S. (1986). Of sealing wax and string. In Little Science, Big Science–and Beyond (pp. 237-253). New York, New York: Columbia University Press. p. 240:

Rabkin, Y. M. (1992). Rediscovering the instrument: Research, industry, and education. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 57-82). Bellingham, Washington: SPIE Optical Engineering Press.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roche, J. (1998). The mathematics of measurement: A critical history. London: The Athlone Press.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Turner, J. (1955, November). Maxwell on the method of physical analogy. British Journal for the Philosophy of Science, 6, 226-238.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.