Archive for the ‘Quality Assessment & Improvement’ Category

Comment on Kerrey and Leeds in WSJ

November 20, 2013

Writing in today’s Wall Street Journal, Bob Kerrey and Jeffery T. Leeds note the unintended consequences likely to follow from new higher education regulations proposed by the U.S. Department of Education. Cutting to the chase, Kerrey and Leeds’ key points (emphases added) are that:

  • “Absent innovative, competitive—and, yes, disruptive—pressure to raise quality and lower costs, all the well-intentioned federal regulation in the world will not make college more accessible.”
  • “He [Secretary of Education, Arne Duncan] should insist on real and significant disclosure. Colleges should be required to post their graduation rates, job-placement rates, the average debt of their students upon graduation, their tax status and any and all information that will enable Americans to make informed decisions when choosing a school.”
  • “The department should also work with schools and colleges to address the fundamental causes of rising tuition, and hold schools accountable for student outcomes instead of their debt.”

These are, of course, exactly the themes repeatedly raised in this blog. Measurement quality is unavoidably implicated in holding schools accountable for student outcomes, in enabling consumers to make informed purchasing decisions, and in raising quality and lowering costs.

To meet the challenges we face, measurement quality must be far more than just a matter of precision and rigor. Quality must also speak to relevance, efficiency, and meaningfulness. Recent history has brought home the lesson that annual tests used solely for accountability purposes will not enable rebalanced quality/cost equations, informed consumer decisions, or fair accountability results. But how might these disparate purposes be efficiently and meaningfully realized?

It is essential that, if teachers are to be responsible for student outcomes and for raising the overall quality of education, formative measuring tools must provide the qualitative and quantitative information they need to be able to act responsibly. The irony is, of course, that the way to overcome the problems of a purely summative focus for educational measurement is to measure more! Now, measuring more need not involve devoting more time exclusively to taking tests. Instead, computerized and online assessments are increasingly integrated into instruction so that measures are made in the course of studying (Cheng and Mok, 2007; Wilson, 2004). Measures are thereby continuously updated, and are plotted in growth charts relative to long range outcome goals.

Furthermore, the qualitative information provided by the measurement process is used to inform teachers and students about what comes next in the individualized curriculum, as well as about special strengths and weaknesses. This information has been shown to be unparalleled in its value for advancing learning in the classroom (Black and Wiliam, 1998, 2009; Hattie, 2008).

But formative assessment alone will not be sufficient to the larger tasks of raising quality and lowering costs. For that, systematic quality improvement methods in schools will need to be joined with comparable outcome measures parents and students can use to inform school choice decisions (Fisher, 2013; Lunenberg, 2010).

Kerrey and Leeds rightly seek an infrastructure capable of disruptive effects, of transforming the inflationary economy of education (and health care). To state again a recurring theme in this blog, the command and control hierarchies of regulatory systems can and should be replaced with a metrological infrastructure of common metrics with the scientific, legal, and financial status of common currencies for the exchange of value. Only when such currencies are in place will we be able to set out clear paths for the informed decisions, improved quality, lower costs, and accountability for outcomes that we seek.

References

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7-74.

Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21, 5-31.

Cheng, Y. C., & Mok, M. M. C. (2007). School-based management and paradigm shift in education: An empirical study. International Journal of Educational Management, 21(6), 517-542.

Fisher, W. P., Jr. (2013). Imagining education tailored to assessment as, for, and of learning: Theory, standards, and quality improvement. Assessment and Learning, 2, in press.

Hattie, J. (2008). Visible learning. New York: Routledge.

Lunenberg, F. C. (2010). Total Quality Management applied to schools. Schooling, 1(1), 1-6.

Wilson, M. (Ed.). (2004). Towards coherence between classroom assessment and accountability. (Vol. 103, Part II, National Society for the Study of Education Yearbooks). Chicago, Illinois: University of Chicago Press.

Debt, Revenue, and Changing the Way Washington Works: The Greatest Entrepreneurial Opportunity of Our Time

July 30, 2011

“Holding the line” on spending and taxes does not make for a fundamental transformation of the way Washington works. Simply doing less of one thing is just a small quantitative change that does nothing to build positive results or set a new direction. What we need is a qualitative metamorphosis akin to a caterpillar becoming a butterfly. In contrast with this beautiful image of natural processes, the arguments and so-called principles being invoked in the sham debate that’s going on are nothing more than fights over where to put deck chairs on the Titanic.

What sort of transformation is possible? What kind of a metamorphosis will start from who and where we are, but redefine us sustainably and responsibly? As I have repeatedly explained in this blog, my conference presentations, and my publications, with numerous citations of authoritative references, we already possess all of the elements of the transformation. We have only to organize and deploy them. Of course, discerning what the resources are and how to put them together is not obvious. And though I believe we will do what needs to be done when we are ready, it never hurts to prepare for that moment. So here’s another take on the situation.

Infrastructure that supports lean thinking is the name of the game. Lean thinking focuses on identifying and removing waste. Anything that consumes resources but does not contribute to the quality of the end product is waste. We have enormous amounts of wasteful inefficiency in many areas of our economy. These inefficiencies are concentrated in areas in which management is hobbled by low quality information, where we lack the infrastructure we need.

Providing and capitalizing on this infrastructure is The Greatest Entrepreneurial Opportunity of Our Time. Changing the way Washington (ha! I just typed “Wastington”!) works is the same thing as mitigating the sources of risk that caused the current economic situation. Making government behave more like a business requires making the human, social, and natural capital markets more efficient. Making those markets more efficient requires reducing the costs of transactions. Those costs are determined in large part by information quality, which is a function of measurement.

It is often said that the best way to reduce the size of government is to move the functions of government into the marketplace. But this proposal has never been associated with any sense of the infrastructural components needed to really make the idea work. Simply reducing government without an alternative way of performing its functions is irresponsible and destructive. And many of those who rail on and on about how bad or inefficient government is fail to recognize that the government is us. We get the government we deserve. The government we get follows directly from the kind of people we are. Government embodies our image of ourselves as a people. In the US, this is what having a representative form of government means. “We the people” participate in our society’s self-governance not just by voting, writing letters to congress, or demonstrating, but in the way we spend our money, where we choose to live, work, and go to school, and in every decision we make. No one can take a breath of air, a drink of water, or a bite of food without trusting everyone else to not carelessly or maliciously poison them. No one can buy anything or drive down the street without expecting others to behave in predictable ways that ensure order and safety.

But we don’t just trust blindly. We have systems in place to guard against those who would ruthlessly seek to gain at everyone else’s expense. And systems are the point. No individual person or firm, no matter how rich, could afford to set up and maintain the systems needed for checking and enforcing air, water, food, and workplace safety measures. Society as a whole invests in the infrastructure of measures created, maintained, and regulated by the government’s Department of Commerce and the National Institute for Standards and Technology (NIST). The moral importance and the economic value of measurement standards has been stressed historically over many millennia, from the Bible and the Quran to the Magna Carta and the French Revolution to the US Constitution. Uniform weights and measures are universally recognized and accepted as essential to fair trade.

So how is it that we nonetheless apparently expect individuals and local organizations like schools, businesses, and hospitals to measure and monitor students’ abilities; employees’ skills and engagement; patients’ health status, functioning, and quality of care; etc.? Why do we not demand common currencies for the exchange of value in human, social, and natural capital markets? Why don’t we as a society compel our representatives in government to institute the will of the people and create new standards for fair trade in education, health care, social services, and environmental management?

Measuring better is not just a local issue! It is a systemic issue! When measurement is objective and when we all think together in the common language of a shared metric (like hours, volts, inches or centimeters, ounces or grams, degrees Fahrenheit or Celsius, etc.), then and only then do we have the means we need to implement lean strategies and create new efficiencies systematically. We need an Intangible Assets Metric System.

The current recession in large part was caused by failures in measuring and managing trust, responsibility, loyalty, and commitment. Similar problems in measuring and managing human, social, and natural capital have led to endlessly spiraling costs in education, health care, social services, and environmental management. The problems we’re experiencing in these areas are intimately tied up with the way we formulate and implement group level decision making processes and policies based in statistics when what we need is to empower individuals with the tools and information they need to make their own decisions and policies. We will not and cannot metamorphose from caterpillar to butterfly until we create the infrastructure through which we each can take full ownership and control of our individual shares of the human, social, and natural capital stock that is rightfully ours.

We well know that we manage what we measure. What counts gets counted. Attention tends to be focused on what we’re accountable for. But–and this is vitally important–many of the numbers called measures do not provide the information we need for management. And not only are lots of numbers giving us low quality information, there are far too many of them! We could have better and more information from far fewer numbers.

Previous postings in this blog document the fact that we have the intellectual, political, scientific, and economic resources we need to measure and manage human, social, and natural capital for authentic wealth. And the issue is not a matter of marshaling the will. It is hard to imagine how there could be more demand for better management of intangible assets than there is right now. The problem in meeting that demand is a matter of imagining how to start the ball rolling. What configuration of investments and resources will start the process of bursting open the chrysalis? How will the demand for meaningful mediating instruments be met in a way that leads to the spreading of the butterfly’s wings? It is an exciting time to be alive.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Translating Gingrich’s Astute Observations on Health Care

June 30, 2011

“At the very heart of transforming health and healthcare is one simple fact: it will require a commitment by the federal government to invest in science and discovery. The period between investment and profit for basic research is too long for most companies to ever consider making the investment. Furthermore, truly basic research often produces new knowledge that everyone can use, so there is no advantage to a particular company to make the investment. The result is that truly fundamental research is almost always a function of government and foundations because the marketplace discourages focusing research in that direction” (p. 169 in Gingrich, 2003).

Gingrich says this while recognizing (p. 185) that:

“Money needs to be available for highly innovative ‘out of the box’ science. Peer review is ultimately a culturally conservative and risk-averse model. Each institution’s director should have a small amount of discretionary money, possibly 3% to 5% of their budget, to spend on outliers.”

He continues (p. 170), with some important elaborations on the theme:

“America’s economic future is a direct function of our ability to take new scientific research and translate it into entrepreneurial development.”

“The [Hart/Rudman] Commission’s second conclusion was that the failure to invest in scientific research and the failure to reform math and science education was the second largest threat to American security [behind terrorism].”

“Our goal [in the Hart/Rudman Commission] was to communicate the centrality of the scientific endeavor to American life and the depth of crisis we believe threatens the math and science education system. The United States’ ability to lead today is a function of past investments in scientific research and math and science education. There is no reason today to believe we will automatically maintain that lead especially given our current investments in scientific research and the staggering levels of our failures in math and science education.”

“Our ability to lead in 2025 will be a function of current decisions. Increasing our investment in science and discovery is a sound and responsible national security policy. No other federal expenditure will do more to create jobs, grow wealth, strengthen our world leadership, protect our environment, promote better education, or ensure better health for the country. We must make this increase now.”

On p. 171, this essential point is made:

“In health and healthcare, it is particularly important to increase our investment in research.”

This is all good. I agree completely. What NG says is probably more true than he realizes, in four ways.

First, the scientific capital created via metrology, controlled via theory, and embodied in technological instruments is the fundamental driver of any economy. The returns on investments in metrological improvements range from 40% to over 400% (NIST, 1996). We usually think of technology and technical standards in terms of computers, telecommunications, and electronics, but there actually is not anything at all in our lives untouched by metrology, since the air, water, food, clothing, roads, buildings, cars, appliances, etc. are all monitored, maintained, and/or manufactured relative to various kinds of universally uniform standards. NG is, as most people are, completely unaware that such standards are feasible and already under development for health, functionality, quality of life, quality of care, math and science education, etc. Given the huge ROIs associated with metrological improvements, there ought to be proportionately huge investments being made in metrology for human, social, and natural capital.

Second, NG’s point concerning national security is right on the mark, though for reasons that go beyond the ones he gives. There are very good reasons for thinking investments in, and meaningful returns from, the basic science for human, social, and natural capital metrology could be expected to undercut the motivations for terrorism and the retreats into fundamentalisms of various kinds that emerge in the face of the failures of liberal democracy (Marty, 2001). Making all forms of capital measured, managed, and accountable within a common framework accessible to everyone everywhere could be an important contributing factor, emulating the property titling rationale of DeSoto (1989, 2000) and the support for distributed cognition at the social level provided by metrological networks (Latour, 1987, 2005; Magnus, 2007), The costs of measurement can be so high as to stifle whole economies (Barzel, 1982), which is, broadly speaking, the primary problem with the economies of education, health care, social services, philanthropy, and environmental management (see, for instance, regarding philanthropy, Goldberg, 2009). Building the legal and financial infrastructure for low-friction titling and property exchange has become a basic feature of World Bank and IMF projects. My point, ever since I read De Soto, has been that we ought to be doing the same thing for human, social, and natural capital, facilitating explicit ownership of the skills, motivations, health, trust, and environmental resources that are rightfully the property of each of us, and that similar effects on national security ought to follow.

Third, NG makes an excellent point when he stresses the need for health and healthcare to be individual-centered, saying that, in contrast with the 20th-century healthcare system, “In the 21st Century System of Health and Healthcare, you will own your medical record, control your healthcare dollars, and be able to make informed choices about healthcare providers.” This is basically equivalent to saying that health capital needs to be fungible, and it can’t be fungible, of course, without a metrological infrastructure that makes every measure of outcomes, quality of life, etc. traceable to a reference standard. Individual-centeredness is also, of course, what distinguishes proper measurement from statistics. Measurement supports inductive inference, from the individual to the population, where statistics are deductive, going from the population to the individual (Fisher & Burton, 2010; Fisher, 2010). Individual-centered healthcare will never go anywhere without properly calibrated instrumentation and the traceability to reference standards that makes measures meaningful.

Fourth, NG repeatedly indicates how appalled he is at the slow pace of change in healthcare, citing research showing that it can take up to 17 years for doctors to adopt new procedures. I contend that this is an effect of our micromanagement of dead, concrete forms of capital. In a fluid living capital market, not only will consumers be able to reward quality in their purchasing decisions by having the information they need when they need it and in a form they can understand, but the quality improvements will be driven from the provider side in much the same way. As Brent James has shown, readily available, meaningful, and comparable information on natural variation in outcomes makes it much easier for providers to improve results and reduce the variation in them. Despite its central importance and the many years that have passed, however, the state of measurement in health care remains in dire need of dramatic improvement. Fryback (1993, p. 271; also see Kindig, 1999) succinctly put the point, observing that the U.S.

“health care industry is a $900 + billion [over $2.5 trillion in 2009 (CMS, 2011] endeavor that does not know how to measure its main product: health. Without a good measure of output we cannot truly optimize efficiency across the many different demands on resources.”

Quantification in health care is almost universally approached using methods inadequate to the task, resulting in ordinal and scale-dependent scores that cannot take advantage of the objective comparisons provided by invariant, individual-level measures (Andrich, 2004). Though data-based statistical studies informing policy have their place, virtually no effort or resources have been invested in developing individual-level instruments traceable to universally uniform metrics that define the outcome products of health care. These metrics are key to efficiently harmonizing quality improvement, diagnostic, and purchasing decisions and behaviors in the manner described by Berwick, James, and Coye (2003) without having to cumbersomely communicate the concrete particulars of locally-dependent scores (Heinemann, Fisher, & Gershon, 2006). Metrologically-based common product definitions will finally make it possible for quality improvement experts to implement analogues of the Toyota Production System in healthcare, long presented as a model but never approached in practice (Coye, 2001).

So, what does all of this add up to? A new division for human, social, and natural capital in NIST is in order, with extensive involvement from NIH, CMS, AHRQ, and other relevant agencies. Innovative measurement methods and standards are the “out of the box” science NG refers to. Providing these tools is the definitive embodiment of an appropriate role for government. These are the kinds of things that we could have a productive conversation with NG about, it seems to me….

References

 Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Barzel, Y. (1982). Measurement costs and the organization of markets. Journal of Law and Economics, 25, 27-48.

Berwick, D. M., James, B., & Coye, M. J. (2003, January). Connections between quality measurement and improvement. Medical Care, 41(1 (Suppl)), I30-38.

Centers for Medicare and Medicaid Services. (2011). National health expenditure data: NHE fact sheet. Retrieved 30 June 2011, from https://www.cms.gov/NationalHealthExpendData/25_NHE_Fact_Sheet.asp.

Coye, M. J. (2001, November/December). No Toyotas in health care: Why medical care has not evolved to meet patients’ needs. Health Affairs, 20(6), 44-56.

De Soto, H. (1989). The other path: The economic answer to terrorism. New York: Basic Books.

De Soto, H. (2000). The mystery of capital: Why capitalism triumphs in the West and fails everywhere else. New York: Basic Books.

Fisher, W. P., Jr. (2010). Statistics and measurement: Clarifying the differences. Rasch Measurement Transactions, 23(4), 1229-1230 [http://www.rasch.org/rmt/rmt234.pdf].

Fisher, W. P., Jr., & Burton, E. (2010). Embedding measurement within existing computerized data systems: Scaling clinical laboratory and medical records heart failure data to predict ICU admission. Journal of Applied Measurement, 11(2), 271-287.

Fryback, D. (1993). QALYs, HYEs, and the loss of innocence. Medical Decision Making, 13(4), 271-2.

Gingrich, N. (2008). Real change: From the world that fails to the world that works. Washington, DC: Regnery Publishing.

Goldberg, S. H. (2009). Billions of drops in millions of buckets: Why philanthropy doesn’t advance social progress. New York: Wiley.

Heinemann, A. W., Fisher, W. P., Jr., & Gershon, R. (2006). Improving health care quality with outcomes management. Journal of Prosthetics and Orthotics, 18(1), 46-50 [http://www.oandp.org/jpo/library/2006_01S_046.asp].

Kindig, D. A. (1997). Purchasing population health. Ann Arbor, Michigan: University of Michigan Press.

Kindig, D. A. (1999). Purchasing population health: Aligning financial incentives to improve health outcomes. Nursing Outlook, 47, 15-22.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York: Cambridge University Press.

Latour, B. (2005). Reassembling the social: An introduction to Actor-Network-Theory. (Clarendon Lectures in Management Studies). Oxford, England: Oxford University Press.

Magnus, P. D. (2007). Distributed cognition and the task of science. Social Studies of Science, 37(2), 297-310.

Marty, M. (2001). Why the talk of spirituality today? Some partial answers. Second Opinion, 6, 53-64.

Marty, M., & Appleby, R. S. (Eds.). (1993). Fundamentalisms and society: Reclaiming the sciences, the family, and education. The fundamentalisms project, vol. 2. Chicago: University of Chicago Press.

National Institute for Standards and Technology. (1996). Appendix C: Assessment examples. Economic impacts of research in metrology. In Committee on Fundamental Science, Subcommittee on Research (Ed.), Assessing fundamental science: A report from the Subcommittee on Research, Committee on Fundamental Science. Washington, DC: National Standards and Technology Council

[http://www.nsf.gov/statistics/ostp/assess/nstcafsk.htm#Topic%207; last accessed 30 June 2011].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Parameterizing Perfection: Practical Applications of a Mathematical Model of the Lean Ideal

April 2, 2010

To properly pursue perfection, we need to parameterize it. That is, taking perfection as the ideal, unattainable standard against which we judge our performance is equivalent to thinking of it as a mathematical model. Organizations are intended to realize their missions independent of the particular employees, customers, suppliers, challenges, products, etc. they happen to engage with at any particular time. Organizational performance measurement (Spitzer, 2007) ought to then be designed in terms of a model that posits, tests for, and capitalizes on the always imperfectly realized independence of those parameters.

Lean thinking (Womack & Jones, 1996) focuses on minimizing waste and maximizing value. At every point at which resources are invested in processes, services, or products, the question is asked, “What value is added here?” Resources are wasted when no value is added, when they can be removed with no detrimental effect on the value of the end product. In their book, Natural Capitalism: Creating the Next Industrial Revolution, Hawken, Lovins, and Lovins (1999, p. 133) say

“Lean thinking … changes the standard for measuring corporate success. … As they [Womack and Jones] express it: ‘Our earnest advice to lean firms today is simple. To hell with your competitors; compete against perfection by identifying all activities that are muda [the Japanese term for waste used in Toyota’s landmark quality programs] and eliminating them. This is an absolute rather than a relative standard which can provide the essential North Star for any organization.”

Further, every input should “be presumed waste until shown otherwise.” A constant, ongoing, persistent pressure for removing waste is the basic characteristic of lean thinking. Perfection is never achieved, but it aptly serves as the ideal against which progress is measured.

Lean thinking sounds a lot like a mathematical model, though it does not seem to have been written out in a mathematical form, or used as the basis for calibrating instruments, estimating measures, evaluating data quality, or for practical assessments of lean organizational performance. The closest anyone seems to have come to parameterizing perfection is in the work of Genichi Taguchi (Ealey, 1988), which has several close parallels with Rasch measurement (Linacre, 1993).  But meaningful and objective quantification, as required and achieved in the theory and practice of fundamental measurement (Andrich, 2004; Bezruczko, 2005; Bond & Fox 2007; Smith & Smith, 2004; Wilson, 2005; Wright, 1999), in fact asserts abstract ideals of perfection as models of organizational, social, and psychological processes in education, health care, marketing, etc. These models test the extent to which outcomes remain invariant across examination or survey questions, across teachers, students, schools, and curricula, or across treatment methods, business processes, or policies.

Though as yet implemented only to a limited extent in business (Drehmer, Belohlav, James, & Coye, 2000; Drehmer & Deklava, 2001;  Lunz & Linacre, 1998; Salzberger, 2009), advanced measurement’s potential rewards are great. Fundamental measurement theory has been successfully applied in research and practice thousands of times over the last 40 years and more, including in very large scale assessments and licensure/certification applications (Adams, Wu, & Macaskill, 1997; Masters, 2007; Smith, Julian, Lunz, et al., 1994). These successes speak to an opportunity for making broad improvements in outcome measurement that could provide more coherent product definition, and significant associated opportunities for improving product quality and the efficiency with which it is produced, in the manner that has followed from the use of fundamental measures in other industries.

Of course, processes and outcomes are never implemented or obtained with perfect consistency. This would be perfectly true only in a perfect world. But to pursue perfection, we need to parameterize it. In other words, to raise the bar in any area of performance assessment, we have to know not only what direction is up, but we also need to know when we have raised the bar far enough. But we cannot tell up from down, we do not know how much to raise the bar, and we cannot properly evaluate the effects of lean experiments when we have no way of locating measures on a number line that embodies the lean ideal.

To think together collectively in ways that lead to significant new innovations, to rise above what Jaron Lanier calls the “global mush” of confused and self-confirming hive thinking, we need the common languages of widely accepted fundamental measures of the relevant processes and outcomes, measures that remain constant across samples of customers, patients, employees, students, etc., and across products, sales techniques, curricula, treatment processes, assessment methods, and brands of instrument.

We are all well aware that the consequences of not knowing where the bar is, of not having product definitions, can be disastrous. In many respects, as I’ve said previously in this blog, the success or failure of health care reform hinges on getting measurement right. The Institute of Medicine report, To Err is Human, of several years ago stresses the fact that system failures pose the greatest threat to safety in health care because they lead to human errors. When a system as complex as health care lacks a standard product definition, and product delivery is fragmented across multiple providers with different amounts and kinds of information in different settings, the system becomes dangerously cumbersome and over-complicated, with unacceptably wide variations and errors in its processes and outcomes, not to even speak of its economic inefficiency.

In contrast with the widespread use of fundamental measures in the product definitions of other industries, health care researchers typically implement neither the longstanding, repeatedly proven, and mathematically rigorous models of fundamental measurement theory nor the metrological networks through which reference standard metrics are engineered. Most industries carefully define, isolate, and estimate the parameters of their products, doing so in ways 1) that ensure industry-wide comparability and standardization, and 2) that facilitate continuous product improvement by revealing multiple opportunities for enhancement. Where organizations in other industries manage by metrics and thereby keep their eyes on the ball of product quality, health care organizations often manage only their own internal processes and cannot in fact bring the product quality ball into view.

In his message concerning the Institute for Healthcare Improvement’s Pursuing Perfection project a few years ago, Don Berwick, like others (Coye, 2001; Coye & Detmer, 1998), observed that health care does not yet have an organization setting new standards in the way that Toyota did for the auto industry in the 1970s. It still doesn’t, of course. Given the differences between the auto and health care industries uses of fundamental measures of product quality and associated abilities to keep their eyes on the quality ball, is it any wonder then, that no one in health care has yet hit a home run? It may well be that no one will hit a home run in health care until reference standard measures of product quality are devised.

The need for reference standard measures in uniform data systems is crucial, and the methods for obtaining them are widely available and well-known. So what is preventing the health care industry from adopting and deploying them? Part of the answer is the cost of the initial investment required. In 1980, metrology comprised about six percent of the U.S. gross national product (Hunter, 1980). In the period from 1981 to 1994, annual expenditures on research and development in the U.S. were less than three percent of the GNP, and non-defense R&D was about two percent (NIST Subcommittee on Research, National Science and Technology Council, 1996). These costs, however, must be viewed as investments from which high rates of return can be obtained (Barber, 1987; Gallaher, Rowe, Rogozhin, et al., 2007; Swann, 2005).

For instance, the U.S. National Institute of Standards and Technology estimated the economic impact of 12 areas of research in metrology, in four broad areas including semiconductors, electrical calibration and testing, optical industries, and computer systems (NIST, 1996, Appendix C; also see NIST, 2003). The median rate of return in these 12 areas was 147 percent, and returns ranged from 41 to 428 percent. The report notes that these results compare favorably with those obtained in similar studies of return rates from other public and private research and development efforts. Even if health care metrology produces only a small fraction of the return rate produced in physical metrology, its economic impact could still amount to billions of dollars annually. The proposed pilot projects therefore focus on determining what an effective health care outcomes metrology system should look like. What should its primary functions be? What should it cost? What rates of return could be expected from it?

Metrology, the science of measurement (Pennella, 1997), requires 1) that instruments be calibrated within individual laboratories so as to isolate and estimate the values of the required parameters (Wernimont, 1978); and 2) that individual instruments’ capacities to provide the same measure for the same amount, and so be traceable to a reference standard, be established and monitored via interlaboratory round-robin trials (Mandel, 1978).

Fundamental measurement has already succeeded in demonstrating the viability of reference standard measures of health outcomes, measures whose meaningfulness does not depend on the particular samples of items employed or patients measured. Though this work succeeds as far as it goes, it being done in a context that lacks any sense of the need for metrological infrastructure. Health care needs networks of scientists and technicians collaborating not only in the first, intralaboratory phase of metrological work, but also in the interlaboratory trials through which different brands or configurations of instruments intended to measure the same variable would be tuned to harmoniously produce the same measure for the same amount.

Implementation of the two phases of metrological innovation in health care would then begin with the intralaboratory calibration of existing and new instruments for measuring overall organizational performance, quality of care, and patients’ health status, quality of life, functionality, etc.  The second phase takes up the interlaboratory equating of these instruments, and the concomitant deployment of reference standard units of measurement throughout a health care system and the industry as a whole. To answer questions concerning health care metrology’s potential returns on investment, the costs for, and the savings accrued from, accomplishing each phase of each pilot will be tracked or estimated.

When instruments measuring in universally uniform, meaningful units are put in the hands of clinicians, a new scientific revolution will occur in medicine. It will be analogous to previous ones associated with the introduction of the thermometer and the instruments of optometry and the clinical laboratory. Such tools will multiply many times over the quality improvement methods used by Brent James, touted as holding the key to health care reform in a recent New York Times profile. Instead of implicitly hypothesizing models of perfection and assessing performance relative to them informally, what we need is a new science that systematically implements the lean ideal on industry-wide scales. The future belongs to those who master these techniques.

References

Adams, R. J., Wu, M. L., & Macaskill, G. (1997). Scaling methodology and procedures for the mathematics and science scales. In M. O. Martin & D. L. Kelly (Eds.), Third International Mathematics and Science Study Technical Report: Vol. 2: Implementation and Analysis – Primary and Middle School Years (pp. 111-145). Chestnut Hill, MA: Boston College.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Barber, J. M. (1987). Economic rationale for government funding of work on measurement standards. In R. Dobbie, J. Darrell, K. Poulter & R. Hobbs (Eds.), Review of DTI work on measurement standards (p. Annex 5). London: Department of Trade and Industry.

Berwick, D. M., James, B., & Coye, M. J. (2003, January). Connections between quality measurement and improvement. Medical Care, 41(1 (Suppl)), I30-38.

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Coye, M. J. (2001, November/December). No Toyotas in health care: Why medical care has not evolved to meet patients’ needs. Health Affairs, 20(6), 44-56.

Coye, M. J., & Detmer, D. E. (1998). Quality at a crossroads. The Milbank Quarterly, 76(4), 759-68.

Drehmer, D. E., Belohlav, J. A., & Coye, R. W. (2000, Dec). A exploration of employee participation using a scaling approach. Group & Organization Management, 25(4), 397-418.

Drehmer, D. E., & Deklava, S. M. (2001, April). A note on the evolution of software engineering practices. Journal of Systems and Software, 57(1), 1-7.

Ealey, L. A. (1988). Quality by design: Taguchi methods and U.S. industry. Dearborn MI: ASI Press.

Gallaher, M. P., Rowe, B. R., Rogozhin, A. V., Houghton, S. A., Davis, J. L., Lamvik, M. K., et al. (2007). Economic impact of measurement in the semiconductor industry (Tech. Rep. No. 07-2). Gaithersburg, MD: National Institute for Standards and Technology.

Hawken, P., Lovins, A., & Lovins, H. L. (1999). Natural capitalism: Creating the next industrial revolution. New York: Little, Brown, and Co.

Hunter, J. S. (1980, November). The national system of scientific measurement. Science, 210(21), 869-874.

Linacre, J. M. (1993). Quality by design: Taguchi and Rasch. Rasch Measurement Transactions, 7(2), 292.

Lunz, M. E., & Linacre, J. M. (1998). Measurement designs using multifacet Rasch modeling. In G. A. Marcoulides (Ed.), Modern methods for business research. Methodology for business and management (pp. 47-77). Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Mandel, J. (1978, December). Interlaboratory testing. ASTM Standardization News, 6, 11-12.

Masters, G. N. (2007). Special issue: Programme for International Student Assessment (PISA). Journal of Applied Measurement, 8(3), 235-335.

National Institute for Standards and Technology (NIST). (1996). Appendix C: Assessment examples. Economic impacts of research in metrology. In C. o. F. S. Subcommittee on Research (Ed.), Assessing fundamental science: A report from the Subcommittee on Research, Committee on Fundamental Science. Washington, DC: National Standards and Technology Council [http://www.nsf.gov/statistics/ostp/assess/nstcafsk.htm#Topic%207; last accessed 18 February 2008].

National Institute for Standards and Technology (NIST). (2003, 15 January). Outputs and outcomes of NIST laboratory research. Retrieved 12 July 2009, from http://www.nist.gov/director/planning/studies.htm#measures.

Pennella, C. R. (1997). Managing the metrology system. Milwaukee, WI: ASQ Quality Press.\

Salzberger, T. (2009). Measurement in marketing research: An alternative framework. Northampton, MA: Edward Elgar.

Smith, R. M., Julian, E., Lunz, M., Stahl, J., Schulz, M., & Wright, B. D. (1994). Applications of conjoint measurement in admission and professional certification programs. International Journal of Educational Research, 21(6), 653-664.

Smith, E. V., Jr., & Smith, R. M. (2004). Introduction to Rasch measurement. Maple Grove, MN: JAM Press.

Spitzer, D. (2007). Transforming performance measurement: Rethinking the way we measure and drive organizational success. New York: AMACOM.

Swann, G. M. P. (2005, 2 December). John Barber’s pioneering work on the economics of measurement standards [Electronic version]. Retrieved http://www.cric.ac.uk/cric/events/jbarber/swann.pdf from Notes for Workshop in Honor of John Barber held at University of Manchester.

Wernimont, G. (1978, December). Careful intralaboratory study must come first. ASTM Standardization News, 6, 11-12.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Womack, J. P., & Jones, D. T. (1996, Sept./Oct.). Beyond Toyota: How to root out waste and pursue perfection. Harvard Business Review, 74, 140-58.

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

How Evidence-Based Decision Making Suffers in the Absence of Theory and Instrument: The Power of a More Balanced Approach

January 28, 2010

The Basis of Evidence in Theory and Instrument

The ostensible point of basing decisions in evidence is to have reasons for proceeding in one direction versus any other. We want to be able to say why we are proceeding as we are. When we give evidence-based reasons for our decisions, we typically couch them in terms of what worked in past experience. That experience might have been accrued over time in practical applications, or it might have been deliberately arranged in one or more experimental comparisons and tests of concisely stated hypotheses.

At its best, generalizing from past experience to as yet unmet future experiences enables us to navigate life and succeed in ways that would not be possible if we could not learn and had no memories. The application of a lesson learned from particular past events to particular future events involves a very specific inferential process. To be able to recognize repeated iterations of the same things requires the accumulation of patterns of evidence. Experience in observing such patterns allows us to develop confidence in our understanding of what that pattern represents in terms of pleasant or painful consequences. When we are able to conceptualize and articulate an idea of a pattern, and when we are then able to recognize a new occurrence of that pattern, we have an idea of it.

Evidence-based decision making is then a matter of formulating expectations from repeatedly demonstrated and routinely reproducible patterns of observations that lend themselves to conceptual representations, as ideas expressed in words. Linguistic and cultural frameworks selectively focus attention by projecting expectations and filtering observations into meaningful patterns represented by words, numbers, and other symbols. The point of efforts aimed at basing decisions in evidence is to try to go with the flow of this inferential process more deliberately and effectively than might otherwise be the case.

None of this is new or controversial. However, the inferential step from evidence to decision always involves unexamined and unjustified assumptions. That is, there is always an element of metaphysical faith behind the expectation that any given symbol or word is going to work as a representation of something in the same way that it has in the past. We can never completely eliminate this leap of faith, since we cannot predict the future with 100% confidence. We can, however, do a lot to reduce the size of the leap, and the risks that go with it, by questioning our assumptions in experimental research that tests hypotheses as to the invariant stability and predictive utility of the representations we make.

Theoretical and Instrumental Assumptions Hidden Behind the Evidence

For instance, evidence as to the effectiveness of an intervention or treatment is often expressed in terms of measures commonly described as quantitative. But it is unusual for any evidence to be produced justifying that description in terms of something that really adds up in the way numbers do. So we often find ourselves in situations in which our evidence is much less meaningful, reliable, and valid than we suppose it to be.

Quantitative measures are often valued as the hallmark of rational science. But their capacity to live up to this billing depends on the quality of the inferences that can be supported. Very few researchers thoroughly investigate the quality of their measures and justify the inferences they make relative to that quality.

Measurement presumes a reproducible pattern of evidence that can serve as the basis for a decision concerning how much of something has been observed. It naturally follows that we often base measurement in counts of some kind—successes, failures, ratings, frequencies, etc. The counts, scores, or sums are then often transformed into percentages by dividing them into the maximum possible that could be obtained. Sometimes the scores are averaged for each person measured, and/or for each item or question on the test, assessment, or survey. These scores and percentages are then almost universally fed directly into decision processes or statistical analyses with no further consideration.

The reproducible pattern of evidence on which decisions are based is presumed to exist between the measures, not within them. In other words, the focus is on the group or population statistics, not on the individual measures. Attention is typically focused on the tip of the iceberg, the score or percentage, not on the much larger, but hidden, mass of information beneath it. Evidence is presumed to be sufficient to the task when the differences between groups of scores are of a consistent size or magnitude, but is this sufficient?

Going Past Assumptions to Testable Hypotheses

In other words, does not science require that evidence be explained by theory, and embodied in instrumentation that provides a shared medium of observation? As shown in the blue lines in the Figure below,

  • theory, whether or not it is explicitly articulated, inevitably influences both what counts as valid data and the configuration of the medium of its representation, the instrument;
  • data, whether or not it is systematically gathered and evaluated, inevitably influences both the medium of its representation, the instrument, and the implicit or explicit theory that explains its properties and justifies its applications; and
  • instruments, whether or not they are actually calibrated from a mapping of symbols and substantive amounts, inevitably influence data gathering and the image of the object explained by theory.

The rhetoric of evidence-based decision making skips over the roles of theory and instrumentation, drawing a direct line from data to decision. In leaving theory laxly formulated, we allow any story that makes a bit of sense and is communicated by someone with a bit of charm or power to carry the day. In not requiring calibrated instrumentation, we allow any data that cross the threshold into our awareness to serve as an acceptable basis for decisions.

What we want, however, is to require meaningful measures that really provide the evidence needed for instruments that exhibit invariant calibrations and for theories that provide predictive explanatory control over the variable. As shown in the Figure, we want data that push theory away from the instrument, theory that separates the data and instrument, and instruments that get in between the theory and data.

We all know to distrust too close a correspondence between theory and data, but we too rarely understand or capitalize on the role of the instrument in mediating the theory-data relation. Similarly, when the questions used as a medium for making observations are obviously biased to produce responses conforming overly closely with a predetermined result, we see that the theory and the instrument are too close for the data to serve as an effective mediator.

Finally, the situation predominating in the social sciences is one in which both construct and measurement theories are nearly nonexistent, which leaves data completely dependent on the instrument it came from. In other words, because counts of correct answers or sums of ratings are mistakenly treated as measures, instruments fully determine and restrict the range of measurement to that defined by the numbers of items and rating categories. Once the instrument is put in play, changes to it would make new data incommensurable with old, so, to retain at least the appearance of comparability, the data structure then fully determines and restricts the instrument.

What we want, though, is a situation in which construct and measurement theories work together to make the data autonomous of the particular instrument it came from. We want a theory that explains what is measured well enough for us to be able to modify existing instruments, or create entirely new ones, that give the same measures for the same amounts as the old instruments. We want to be able to predict item calibrations from the properties of the items, we want to obtain the same item calibrations across data sets, and we want to be able to predict measures on the basis of the observed responses (data) no matter which items or instrument was used to produce them.

Most importantly, we want a theory and practice of measurement that allows us to take missing data into account by providing us with the structural invariances we need as media for predicting the future from the past. As Ben Wright (1997, p. 34) said, any data analysis method that requires complete data to produce results disqualifies itself automatically as a viable basis for inference because we never have complete data—any practical system of measurement has to be positioned so as to be ready to receive, process, and incorporate all of the data we have yet to gather. This goal is accomplished to varying degrees in Rasch measurement (Rasch, 1960; Burdick, Stone, & Stenner, 2006; Dawson, 2004). Stenner and colleagues (Stenner, Burdick, Sanford, & Burdick, 2006) provide a trajectory of increasing degrees to which predictive theory is employed in contemporary measurement practice.

The explanatory and predictive power of theory is embodied in instruments that focus attention on recording observations of salient phenomena. These observations become data that inform the calibration of instruments, which then are used to gather further data that can be used in practical applications and in checks on the calibrations and the theory.

“Nothing is so practical as a good theory” (Lewin, 1951, p. 169). Good theory makes it possible to create symbolic representations of things that are easy to think with. To facilitate clear thinking, our words, numbers, and instruments must be transparent. We have to be able to look right through them at the thing itself, with no concern as to distortions introduced by the instrument, the sample, the observer, the time, the place, etc. This happens only when the structure of the instrument corresponds with invariant features of the world. And where words effect this transparency to an extent, it is realized most completely when we can measure in ways that repeatedly give the same results for the same amounts in the same conditions no matter which instrument, sample, operator, etc. is involved.

Where Might Full Mathematization Lead?

The attainment of mathematical transparency in measurement is remarkable for the way it focuses attention and constrains the imagination. It is essential to appreciate the context in which this focusing occurs, as popular opinion is at odds with historical research in this regard. Over the last 60 years, historians of science have come to vigorously challenge the widespread assumption that technology is a product of experimentation and/or theory (Kuhn, 1961/1977; Latour, 1987, 2005; Maas, 2001; Mendelsohn, 1992; Rabkin, 1992; Schaffer, 1992; Heilbron, 1993; Hankins & Silverman, 1999; Baird, 2002). Neither theory nor experiment typically advances until a key technology is widely available to end users in applied and/or research contexts. Rabkin (1992) documents multiple roles played by instruments in the professionalization of scientific fields. Thus, “it is not just a clever historical aphorism, but a general truth, that ‘thermodynamics owes much more to the steam engine than ever the steam engine owed to thermodynamics’” (Price, 1986, p. 240).

The prior existence of the relevant technology comes to bear on theory and experiment again in the common, but mistaken, assumption that measures are made and experimentally compared in order to discover scientific laws. History shows that measures are rarely made until the relevant law is effectively embodied in an instrument (Kuhn, 1961/1977, pp. 218-9): “…historically the arrow of causality is largely from the technology to the science” (Price, 1986, p. 240). Instruments do not provide just measures; rather they produce the phenomenon itself in a way that can be controlled, varied, played with, and learned from (Heilbron, 1993, p. 3; Hankins & Silverman, 1999; Rabkin, 1992). The term “technoscience” has emerged as an expression denoting recognition of this priority of the instrument (Baird, 1997; Ihde & Selinger, 2003; Latour, 1987).

Because technology often dictates what, if any, phenomena can be consistently produced, it constrains experimentation and theorizing by focusing attention selectively on reproducible, potentially interpretable effects, even when those effects are not well understood (Ackermann, 1985; Daston & Galison, 1992; Ihde, 1998; Hankins & Silverman, 1999; Maasen & Weingart, 2001). Criteria for theory choice in this context stem from competing explanatory frameworks’ experimental capacities to facilitate instrument improvements, prediction of experimental results, and gains in the efficiency with which a phenomenon is produced.

In this context, the relatively recent introduction of measurement models requiring additive, invariant parameterizations (Rasch, 1960) provokes speculation as to the effect on the human sciences that might be wrought by the widespread availability of consistently reproducible effects expressed in common quantitative languages. Paraphrasing Price’s comment on steam engines and thermodynamics, might it one day be said that as yet unforeseeable advances in reading theory will owe far more to the Lexile analyzer (Stenner, et al., 2006) than ever the Lexile analyzer owed reading theory?

Kuhn (1961/1977) speculated that the second scientific revolution of the early- to mid-nineteenth century followed in large part from the full mathematization of physics, i.e., the emergence of metrology as a professional discipline focused on providing universally accessible, theoretically predictable, and evidence-supported uniform units of measurement (Roche, 1998). Kuhn (1961/1977, p. 220) specifically suggests that a number of vitally important developments converged about 1840 (also see Hacking, 1983, p. 234). This was the year in which the metric system was formally instituted in France after 50 years of development (it had already been obligatory in other nations for 20 years at that point), and metrology emerged as a professional discipline (Alder, 2002, p. 328, 330; Heilbron, 1993, p. 274; Kula, 1986, p. 263). Daston (1992) independently suggests that the concept of objectivity came of age in the period from 1821 to 1856, and gives examples illustrating the way in which the emergence of strong theory, shared metric standards, and experimental data converged in a context of particular social mores to winnow out unsubstantiated and unsupportable ideas and contentions.

Might a similar revolution and new advances in the human sciences follow from the introduction of evidence-based, theoretically predictive, instrumentally mediated, and mathematical uniform measures? We won’t know until we try.

Figure. The Dialectical Interactions and Mutual Mediations of Theory, Data, and Instruments

Figure. The Dialectical Interactions and Mutual Mediations of Theory, Data, and Instruments

Acknowledgment. These ideas have been drawn in part from long consideration of many works in the history and philosophy of science, primarily Ackermann (1985), Ihde (1991), and various works of Martin Heidegger, as well as key works in measurement theory and practice. A few obvious points of departure are listed in the references.

References

Ackermann, J. R. (1985). Data, instruments, and theory: A dialectical approach to understanding science. Princeton, New Jersey: Princeton University Press.

Alder, K. (2002). The measure of all things: The seven-year odyssey and hidden error that transformed the world. New York: The Free Press.

Aldrich, J. (1989). Autonomy. Oxford Economic Papers, 41, 15-34.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Baird, D. (1997, Spring-Summer). Scientific instrument making, epistemology, and the conflict between gift and commodity economics. Techné: Journal of the Society for Philosophy and Technology, 3-4, 25-46. Retrieved 08/28/2009, from http://scholar.lib.vt.edu/ejournals/SPT/v2n3n4/baird.html.

Baird, D. (2002, Winter). Thing knowledge – function and truth. Techné: Journal of the Society for Philosophy and Technology, 6(2). Retrieved 19/08/2003, from http://scholar.lib.vt.edu/ejournals/SPT/v6n2/baird.html.

Burdick, D. S., Stone, M. H., & Stenner, A. J. (2006). The Combined Gas Law and a Rasch Reading Law. Rasch Measurement Transactions, 20(2), 1059-60 [http://www.rasch.org/rmt/rmt202.pdf].

Carroll-Burke, P. (2001). Tools, instruments and engines: Getting a handle on the specificity of engine science. Social Studies of Science, 31(4), 593-625.

Daston, L. (1992). Baconian facts, academic civility, and the prehistory of objectivity. Annals of Scholarship, 8, 337-363. (Rpt. in L. Daston, (Ed.). (1994). Rethinking objectivity (pp. 37-64). Durham, North Carolina: Duke University Press.)

Daston, L., & Galison, P. (1992, Fall). The image of objectivity. Representations, 40, 81-128.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York, New York: Routledge.

Hacking, I. (1983). Representing and intervening: Introductory topics in the philosophy of natural science. Cambridge: Cambridge University Press.

Hankins, T. L., & Silverman, R. J. (1999). Instruments and the imagination. Princeton, New Jersey: Princeton University Press.

Heelan, P. A. (1983, June). Natural science as a hermeneutic of instrumentation. Philosophy of Science, 50, 181-204.

Heelan, P. A. (1998, June). The scope of hermeneutics in natural science. Studies in History and Philosophy of Science Part A, 29(2), 273-98.

Heidegger, M. (1977). Modern science, metaphysics, and mathematics. In D. F. Krell (Ed.), Basic writings [reprinted from M. Heidegger, What is a thing? South Bend, Regnery, 1967, pp. 66-108] (pp. 243-282). New York: Harper & Row.

Heidegger, M. (1977). The question concerning technology. In D. F. Krell (Ed.), Basic writings (pp. 283-317). New York: Harper & Row.

Heilbron, J. L. (1993). Weighing imponderables and other quantitative science around 1800. Historical studies in the physical and biological sciences), 24(Supplement), Part I, pp. 1-337.

Hessenbruch, A. (2000). Calibration and work in the X-ray economy, 1896-1928. Social Studies of Science, 30(3), 397-420.

Ihde, D. (1983). The historical and ontological priority of technology over science. In D. Ihde, Existential technics (pp. 25-46). Albany, New York: State University of New York Press.

Ihde, D. (1991). Instrumental realism: The interface between philosophy of science and philosophy of technology. (The Indiana Series in the Philosophy of Technology). Bloomington, Indiana: Indiana University Press.

Ihde, D. (1998). Expanding hermeneutics: Visualism in science. Northwestern University Studies in Phenomenology and Existential Philosophy). Evanston, Illinois: Northwestern University Press.

Ihde, D., & Selinger, E. (Eds.). (2003). Chasing technoscience: Matrix for materiality. (Indiana Series in Philosophy of Technology). Bloomington, Indiana: Indiana University Press.

Kuhn, T. S. (1961/1977). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. In T. S. Kuhn, The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago: University of Chicago Press, 1977).

Kula, W. (1986). Measures and men (R. Screter, Trans.). Princeton, New Jersey: Princeton University Press (Original work published 1970).

Lapre, M. A., & Van Wassenhove, L. N. (2002, October). Learning across lines: The secret to more efficient factories. Harvard Business Review, 80(10), 107-11.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. New York, New York: Cambridge University Press.

Latour, B. (2005). Reassembling the social: An introduction to Actor-Network-Theory. (Clarendon Lectures in Management Studies). Oxford, England: Oxford University Press.

Lewin, K. (1951). Field theory in social science: Selected theoretical papers (D. Cartwright, Ed.). New York: Harper & Row.

Maas, H. (2001). An instrument can make a science: Jevons’s balancing acts in economics. In M. S. Morgan & J. Klein (Eds.), The age of economic measurement (pp. 277-302). Durham, North Carolina: Duke University Press.

Maasen, S., & Weingart, P. (2001). Metaphors and the dynamics of knowledge. (Vol. 26. Routledge Studies in Social and Political Thought). London: Routledge.

Mendelsohn, E. (1992). The social locus of scientific instruments. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 5-22). Bellingham, WA: SPIE Optical Engineering Press.

Polanyi, M. (1964/1946). Science, faith and society. Chicago: University of Chicago Press.

Price, D. J. d. S. (1986). Of sealing wax and string. In Little Science, Big Science–and Beyond (pp. 237-253). New York, New York: Columbia University Press.

Rabkin, Y. M. (1992). Rediscovering the instrument: Research, industry, and education. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 57-82). Bellingham, Washington: SPIE Optical Engineering Press.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Roche, J. (1998). The mathematics of measurement: A critical history. London: The Athlone Press.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Thurstone, L. L. (1959). The measurement of values. Chicago: University of Chicago Press, Midway Reprint Series.

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Questions about measurement: If it is so important, why…?

January 28, 2010

If measurement is so important, why is measurement quality so uniformly low?

If we manage what we measure, why is measurement leadership virtually nonexistent?

If we can’t tell if things are getting better, staying the same, or getting worse without good metrics, why is measurement so rarely context-sensitive, focused, integrated, and interactive, as Dean Spitzer recommends it should be?

If quantification is valued for its rigor and convenience, why is no one demanding meaningful mappings of substantive, additive amounts of things measured on number lines?

If everyone is drowning in unmanageable floods of data why isn’t measurement used to reduce data volumes dramatically—and not only with no loss of information but with the addition of otherwise unavailable forms of information?

If learning and improvement are the order of the day, why isn’t anyone interested in the organizational and individual learning trajectories that are defined by hierarchies of calibrated items?

If resilient lean thinking is the way to go, why aren’t more measures constructed to retain their meaning and values across changes in item content?

If flexibility is a core value, why aren’t we adapting instruments to people and organizations, instead of vice versa?

If fair, just, and meaningful measurement is often lacking in judge-assigned performance assessments, why isn’t anyone estimating the consistency, and the leniency or harshness, of ratings—and removing those effects from the measures made?

If efficiency is valued, why does no one at all seem to care about adjusting measurement precision to the needs of the task at hand, so that time and resources are not wasted in gathering too much or too little data?

If it’s common knowledge that we can do more together than we can as individuals, why isn’t anyone providing the high quality and uniform information needed for the networked collective thinking that is able to keep pace with the demand for innovation?

Since the metric system and uniform product standards are widely recognized as essential to science and commerce, why are longstanding capacities for common metrics for human, social, and natural capital not being used?

If efficient markets are such great things, why isn’t anyone at all concerned about lubricating the flow of human, social, and natural capital by investing in the highest quality measurement obtainable?

If everyone loves a good profit, why aren’t we setting up human, social, and natural capital metric systems to inform competitive pricing of intangible assets, products, and services?

If companies are supposed to be organic entities that mature in a manner akin to human development over the lifespan, why is so little being done to conceive, gestate, midwife, and nurture living capital?

In short, if measurement is really as essential to management as it is so often said to be, why doesn’t anyone seek out the state of the art technology, methods, and experts before going to the trouble of developing and implementing metrics?

I suspect the answers to these questions are all the same. These disconnects between word and deed happen because so few people are aware of the technical advances made in measurement theory and practice over the last several decades.

For the deep background, see previous entries in this blog, various web sites (www.rasch.org, www.rummlab.com, www.winsteps.com, http://bearcenter.berkeley.edu/, etc.), and an extensive body of published work (Rasch, 1960; Wright, 1977, 1997a, 1997b, 1999a, 1999b; Andrich, 1988, 2004, 2005; Bond & Fox, 2007; Fisher, 2009, 2010; Smith & Smith, 2004; Wilson, 2005; Wright & Stone, 1999, 2004).

There is a wealth of published applied research in education, psychology, and health care (Bezruczko, 2005; Fisher & Wright, 1994; Masters, 2007; Masters & Keeves, 1999). To find more search Rasch and the substantive area of interest.

For applications in business contexts, there is a more limited number of published resources (ATP, 2001; Drehmer, Belohlav, & Coye, 2000; Drehmer & Deklava, 2001; Ludlow & Lunz, 1998; Lunz & Linacre, 1998; Mohamed, et al., 2008; Salzberger, 2000; Salzberger & Sinkovics, 2006; Zakaria, et al., 2008). I have, however, just become aware of the November, 2009, publication of what could be a landmark business measurement text (Salzberger, 2009). Hopefully, this book will be just one of many to come, and the questions I’ve raised will no longer need to be asked.

References

Andrich, D. (1988). Rasch models for measurement. (Vols. series no. 07-068). Sage University Paper Series on Quantitative Applications in the Social Sciences). Beverly Hills, California: Sage Publications.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Andrich, D. (2005). Georg Rasch: Mathematician and statistician. In K. Kempf-Leonard (Ed.), Encyclopedia of Social Measurement (Vol. 3, pp. 299-306). Amsterdam: Academic Press, Inc.

Association of Test Publishers. (2001, Fall). Benjamin D. Wright, Ph.D. honored with the Career Achievement Award in Computer-Based Testing. Test Publisher, 8(2). Retrieved 20 May 2009, from http://www.testpublishers.org/newsletter7.htm#Wright.

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Dawson, T. L., & Gabrielian, S. (2003, June). Developing conceptions of authority and contract across the life-span: Two perspectives. Developmental Review, 23(2), 162-218.

Drehmer, D. E., Belohlav, J. A., & Coye, R. W. (2000, Dec). A exploration of employee participation using a scaling approach. Group & Organization Management, 25(4), 397-418.

Drehmer, D. E., & Deklava, S. M. (2001, April). A note on the evolution of software engineering practices. Journal of Systems and Software, 57(1), 1-7.

Fisher, W. P., Jr. (2009, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Fisher, W. P., Jr. (2010). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 11, in press [Pre-press version available at http://www.livingcapitalmetrics.com/images/BringingHSN_FisherARMII.pdf].

Ludlow, L. H., & Lunz, M. E. (1998). The Job Responsibilities Scale: Invariance in a longitudinal prospective study. Journal of Outcome Measurement, 2(4), 326-37.

Lunz, M. E., & Linacre, J. M. (1998). Measurement designs using multifacet Rasch modeling. In G. A. Marcoulides (Ed.), Modern methods for business research. Methodology for business and management (pp. 47-77). Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Masters, G. N. (2007). Special issue: Programme for International Student Assessment (PISA). Journal of Applied Measurement, 8(3), 235-335.

Masters, G. N., & Keeves, J. P. (Eds.). (1999). Advances in measurement in educational research and assessment. New York: Pergamon.

Mohamed, A., Aziz, A., Zakaria, S., & Masodi, M. S. (2008). Appraisal of course learning outcomes using Rasch measurement: A case study in information technology education. In L. Kazovsky, P. Borne, N. Mastorakis, A. Kuri-Morales & I. Sakellaris (Eds.), Proceedings of the 7th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems (Electrical And Computer Engineering Series) (pp. 222-238). Cambridge, UK: WSEAS.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Salzberger, T. (2000). An extended Rasch analysis of the CETSCALE – implications for scale development and data construction., Department of Marketing, University of Economics and Business Administration, Vienna (WU-Wien) (http://www2.wu-wien.ac.at/marketing/user/salzberger/research/wp_dataconstruction.pdf).

Salzberger, T. (2009). Measurement in marketing research: An alternative framework. Northampton, MA: Edward Elgar.

Salzberger, T., & Sinkovics, R. R. (2006). Reconsidering the problem of data equivalence in international marketing research: Contrasting approaches based on CFA and the Rasch model for measurement. International Marketing Review, 23(4), 390-417.

Smith, E. V., Jr., & Smith, R. M. (2004). Introduction to Rasch measurement. Maple Grove, MN: JAM Press.35.

Spitzer, D. (2007). Transforming performance measurement: Rethinking the way we measure and drive organizational success. New York: AMACOM.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D. (1997a, June). Fundamental measurement for outcome evaluation. Physical Medicine & Rehabilitation State of the Art Reviews, 11(2), 261-88.

Wright, B. D. (1997b, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Wright, B. D. (1999a). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D. (1999b). Rasch measurement models. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 85-97). New York: Pergamon.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/memos.htm#measess].

Wright, B. D., & Stone, M. H. (2004). Making measures. Chicago: Phaneron Press.

Zakaria, S., Aziz, A. A., Mohamed, A., Arshad, N. H., Ghulman, H. A., & Masodi, M. S. (2008, November 11-13). Assessment of information managers’ competency using Rasch measurement. iccit: Third International Conference on Convergence and Hybrid Information Technology, 1, 190-196 [http://www.computer.org/portal/web/csdl/doi/10.1109/ICCIT.2008.387].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Draft Legislation on Development and Adoption of an Intangible Assets Metric System

November 19, 2009

In my opinion, more could be done to effect meaningful and effective health care reform with legislation like that proposed below, which has fewer than 3,800 words, than will ever be possible with the 2,074 pages in Congress’s current health care reform bill. What’s more, creating the infrastructure for human, social, and natural capital markets in this way would not only cost a tiny fraction of the projected $847 billion bill being debated, it would be an investment that would pay returns many times larger than the initial investment. See previous posts in this blog for more info on how and why this is so.

The draft legislation below is adapted from The Metric Conversion Act (Title 15 U.S.C. Chapter6 §(204) 205a – 205k). The viability of a metric system for human, social, and natural capital is indicated by the realized state of scientific rigor in the measurement of human, social, and natural capital (Fisher, 2009b). The need for such a system is indicated by the current crisis’s pointed economic demands that all forms of capital be unified within a common econometric and financial framework (Fisher, 2009a). It is equally demanded by the moral and philosophical requirements of fair play and meaningfulness (Fisher, 2004). The day is fast approaching when a metric system for intangible assets will be recognized as the urgent need that it is (Fisher, 2009c).

At some point in the near future, it can be expected that a table showing how to interpret the units of the Intangible Assets Metric System will be published in the Federal Register, just as the International System units have been.

For those unfamiliar with the state of the art in measurement, these may seem like wildly unrealistic goals. Those wondering how a reasonable person might arrive at such opinions are urged to consult other posts in this blog, and the references cited in them. The advantages of an intangible assets metric system for sustainable and socially responsible economic policies and practices are nothing short of profound. As Georg Rasch (1980, p. xx) said in reference to the stringent demands of his measurement models, “this is a huge challenge, but once the problem has been formulated it does seem possible to meet it.” We are less likely to attain goals that we do not actively formulate. In the spirit of John Dewey’s student, Chiang Mon-Lin, what we need are “wild hypotheses and careful tests.” There is no wilder idea with greater potential impact for redefining profit as the reduction of waste, and for thereby mitigating human suffering, sociopolitical discontent, and environmental degradation.

Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.

Fisher, W. P., Jr. (2009a). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In M. Wilson, K. Draney, N. Brown, B. Duckor (Eds.), Advances in Rasch Measurement, Vol. Two (p. in press). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2009b, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Fisher, W. P. J. (2009c). NIST Critical national need idea White Paper: Metrological infrastructure for human, social, and natural capital (Tech. Rep.). New Orleans: LivingCapitalMetrics.com.

Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Title xx U.S.C. Chapter x §(100) 101a – 101k
METRIC SYSTEM FOR INTANGIBLE ASSETS DEVELOPMENT LAW
(Pub. L. 10-xxx, §x, Intangible Assets Metrics Development Act, July 25, 2010)

§ 100. New metric system development authorized. – A new national effort is hereby initiated throughout the United States of America focusing on building and realizing the benefits of a metric system for the intangible assets known as human, social, and natural capital.

§ 101a. Congressional statement of findings. – The Congress finds as follows:

(1) The United States was an original signatory party to the 1875 Treaty of the Meter (20 Stat. 709), which established the General Conference of Weights and Measures, the International Committee of Weights and Measures and the International Bureau of Weights and Measures.

(2) The use of metric measurement standards in the United States was authorized by law in 1866; with the Metric Conversion Act of 1975 this Nation established a national policy of committing itself and taking steps to facilitate conversion to the metric system.

(3) World trade is dependent on the metric system of measurement; continuing trends toward globalization demand expansion of the metric system to include vital economic resources shown scientifically measurable in research conducted over the last 80 years.

(4) Industries and consumers in the United States are often at competitive disadvantages when dealing in domestic and international markets because no existing systems for measuring intangible assets (human, social, and natural capital) are expressed in standardized, universally uniform metrics. The end result is that education, health care, human resource, and other markets are unable to reward quality; supply and demand are unmatched, consumers make decisions with no or insufficient information; and quality cannot be systematically improved.

(5) The inherent simplicity of the metric system of measurement and standardization of weights and measures has led to major cost savings in certain industries which have converted to that system; similar savings are expected to follow from the development and implementation of a metric system for intangible assets.

(6) The Federal Government has a responsibility to develop procedures and techniques to assist industry, especially small business, as it voluntarily seeks to adopt a new metric system of measurement for intangible assets that have always required management but which have not yet been uniformly and systematically measured.

(7) A new metric system of measurement for human, social, and natural capital can provide substantial advantages to the Federal Government in its own operations.

§ 101b. Declaration of policy. – It is therefore the declared policy of the United States-

(1) to support the development and implementation of a new metric system of intangibles assets measurement as the preferred system of weights and measures for United States trade and commerce involving human, social, and natural capital;

(2) to require that each Federal agency,by a date certain and to the extent economically feasible by the end of the fiscal year 2011, use the new metric system of intangibles measurement in its procurements, grants, and other business-related activities, except to the extent that such use is impractical or is likely to cause significant inefficiencies or loss of markets to United States firms, such as when foreign competitors are producing competing products in non-metric units; and

(3) to seek out ways to increase understanding of the new metric system of intangibles measurement through educational information and guidance and in Government publications.

§ 101c. Definitions

As used in this subchapter, the term-

(1) ‘Board’ means the United States Intangible Assets Metrics Board, established under section 101d of this Title;

(2) ‘engineering standard’ means a standard which prescribes (A) a concise set of conditions and requirements that must be satisfied by a material, product, process, procedure, convention, or test method; and (B) the physical, functional, performance and/or conformance characteristics thereof;

(3) ‘international standard or recommendation’ means an engineering standard or recommendation which is (A) formulated and promulgated by an international organization and (B) recommended for adoption by individual nations as a national standard;

(4) ‘metric system of measurement’ means the International System of Units as established by the General Conference of Weights and Measures in 1960 and as interpreted or modified for the United States by the Secretary of Commerce;

(5) ‘full and open competition’ has the same meaning as defined in section 403 of title 41;

(6) ‘total installed price’ means the price of purchasing a product or material, trimming or otherwise altering some or all of that product or material, if necessary to fit with other building components,and then installing that product or material into a Federal facility;

(7) ‘hard-metric’ means measurement, design, and manufacture using the metric system of measurement, but does not include measurement,design, and manufacture using English system measurement units which are subsequently reexpressed in the metric system of measurement;

(8) ‘cost or pricing data or price analysis’ has the meaning given such terms in section 254b of title 41; and

(9) ‘Federal facility’ means any public building (as defined under section 612 of title 40) and shall include any Federal building or construction project: (A) on lands in the public domain;(B) on lands used in connection with Federal programs for agriculture research, recreation, and conservation programs; (C) on or used  in connection with river, harbor, flood control, reclamation, or power projects; (D) on or used in connection with housing and residential projects; (E) on military installations (including any fort, camp,post, naval training station, airfield, proving ground, military supply depot, military school, any similar facility of the Department of Defense); (F) on installations of the Department of Veterans Affairs used for hospital or domiciliary purposes; or (G) on lands used in connection with Federal prisons, but does not include (i)any Federal building or construction project the exclusion of which the President deems to be justified in the public interest, or (ii) any construction project or building owned or controlled by a State government, local government, Indian tribe, or any private entity.

§101d. United States Intangible Assets Metrics Board

(a) Establishment. – There is established, in accordance with this section, an independent instrumentality to be known as a United States Intangible Assets Metrics Board.

(b) Membership; Chairman; appointment of members; term of office;vacancies. – The Board shall consist of 17 individuals, as follows:

(1) the Chairman, a qualified individual who shall be appointed by the President, by and with the advice and consent of the Senate;

(2) seventeen members who shall be appointed by the President, by and with the advice and consent of the Senate, on the following basis-

(A) one to be selected from lists of qualified individuals recommended by psychometricians and organizations representative of psychometric interests;

(B) one to be selected from lists of qualified individuals recommended by social scientists, the scientific and technical community, and organizations representative of social scientists and technicians;

(C) one to be selected from lists of qualified individuals recommended by environmental scientists, the scientific and technical community, and organizations representative of environmental scientists and technicians;

(D) one to be selected from a list of qualified individuals recommended by the National Association of Manufacturers or its successor;

(E) one to be selected from lists of qualified individuals recommended by the United States Chamber of Commerce, or its successor, retailers,and other commercial organizations;

(F) two to be selected from lists of qualified individuals recommended by the American Federation of Labor and Congress of Industrial Organizations or its successor, who are representative of workers directly affected by human capital metrics for health, skills, motivations, and productivity, and by other organizations representing labor;

(G) one to be selected from a list of qualified individuals recommended by the National Governors Conference, the National Council of State Legislatures, and organizations representative of State and local government;

(H) two to be selected from lists of qualified individuals recommended by organizations representative of small business;

(I) one to be selected from lists of qualified individuals representative of the human resource management industry;

(J) one to be selected from a list of qualified individuals recommended by the National Conference on Weights and Measures and standards making organizations;

(K) one to be selected from lists of qualified individuals recommended by educators, the educational community, and organizations representative of educational interests; and

(L) four at-large members to represent consumers and other interests deemed suitable by the President and who shall be qualified individuals.

As used in this subsection, each ‘list’ shall include the names of at least three individuals for each applicable vacancy. The terms of office of the members of the Board first taking office shall expire as designated by the President at the time of nomination; five at the end of the second year; five at the end of the fourth year;and six at the end of the sixth year. The term of office of the Chairman of such Board shall be six years. Members, including the Chairman, may be appointed to an additional term of six years, in the same manner as the original appointment. Successors to members of such Board shall be appointed in the same manner as the original members and shall have terms of office expiring six years from the date of expiration of the terms for which their predecessors were appointed. Any individual appointed to fill a vacancy occurring prior to the expiration of any term of office shall be appointed for the remainder of that term. Beginning 45 days after the date of incorporation of the Board, six members of such Board shall constitute a quorum for the transaction of any function of the Board.

(c) Compulsory powers. – Unless otherwise provided by the Congress, the Board shall have no compulsory powers.

(d) Termination. – The Board shall cease to exist when the Congress, by law, determines that its mission has been accomplished.

§101e. – Functions and powers of Board. – It shall be the function of the Board to devise and carry out a broad program of planning, coordination, and public education, consistent with other national policy and interests, with the aim of implementing the policy set forth in this subchapter. In carrying out this program,the Board shall-

(1) consult with and take into account the interests, views, and costs relevant to the inefficiencies that have long plagued the management of unmeasured forms of capital in United States commerce and industry, including small business; science; engineering; labor; education; consumers; government agencies at the Federal, State, and local level; nationally recognized standards developing and coordinating organizations; intangibles metrics development, planning and coordinating groups; and such other individuals or groups as are considered appropriate by the Board to the carrying out of the purposes of this subchapter. The Board shall take into account activities underway in the private and public sectors, so as not to duplicate unnecessarily such activities;

(2) provide for appropriate procedures whereby various groups,under the auspices of the Board, may formulate, and recommend or suggest, to the Board specific programs for coordinating intangibles metrics development in each industry and segment thereof and specific dimensions and configurations in the new metric system and in other measurements for general use. Such programs, dimensions, and configurations shall be consistent with (A) the needs, interests, and capabilities of manufacturers (large and small), suppliers, labor, consumers, educators,and other interested groups, and (B) the national interest;

(3) publicize, in an appropriate manner, proposed programs and provide an opportunity for interested groups or individuals to submit comments on such programs. At the request of interested parties, the Board, in its discretion, may hold hearings with regard to such programs. Such comments and hearings may be considered by the Board;

(4) encourage activities of standardization organizations to develop or revise, as rapidly as practicable, policy and IT standards based on the new intangibles metrics, and to take advantage of opportunities to promote (A) rationalization or simplification of relationships,(B) improvements of design, (C) reduction of size variations, (D) increases in economy, and (E) where feasible, the efficient use of energy and the conservation of natural resources;

(5) encourage the retention, in the new metric language of human, social, and natural capital standards, of those United States policy and IT designs, practices, and conventions that are internationally accepted or that embody superior technology;

(6) consult and cooperate with foreign governments, and intergovernmental organizations, in collaboration with the Department of State, and, through appropriate member bodies, with private international organizations, which are or become concerned with the encouragement and coordination of increased use of intangible assets metrics measurement units or policy and IT standards based on such units, or both. Such consultation shall include efforts, where appropriate, to gain international recognition for intangible assets metrics standards proposed by the United States;

(7) assist the public through information and education programs, to become familiar with the meaning and applicability of metric terms and measures in daily life. Such programs shall include –

(A) public information programs conducted by the Board, through the use of newspapers, magazines, radio, television, the Internet, social networking, and other media, and through talks before appropriate citizens’ groups, and trade and public organizations;

(B) counseling and consultation by the Secretary of Education; the Secretary of Labor; the Administrator of the Small Business Administration; and the Director of the National Science Foundation, with educational associations, State and local educational agencies, labor education committees, apprentice training committees, and other interested groups, in order to assure (i) that the new intangible assets metric system of measurement is included in the curriculum of the Nation’s educational institutions, and (ii) that teachers and other appropriate personnel are properly trained to teach the intangible assets metric system of measurement;

(C) consultation by the Secretary of Commerce with the National Conference of Weights and Measures in order to assure that State and local weights and measures officials are (i) appropriately involved in intangible assets metric development and adoption activities and (ii) assisted in their efforts to bring about timely amendments to weights and measures laws; and

(D) such other public information activities, by any Federal agency in support of this subchapter, as relate to the mission of suchagency;

(8) collect, analyze, and publish information about the extent of usage of intangible assets metric measurements; evaluate the costs and benefits of that usage; and make efforts to minimize any adverse effects resulting from increasing intangible assets metric usage;

(9) conduct research, including appropriate surveys; publish the results of such research; and recommend to the Congress and to the President such action as may be appropriate to deal with any unresolved problems, issues, and questions associated with intangible assets metric development, adoption, or usage, such problems, issues, and questions may include, but are not limited to, the impact on different occupations and industries, possible increased costs to consumers, the impact on society and the economy, effects on small business, the impact on the international trade position of the United States, the appropriateness of and methods for using procurement by the Federal Government as a means to effect development and adoption of the intangible assets metric system, the proper conversion or transition period in particular sectors of society, and consequences for national defense;

(10) submit annually to the Congress and to the President a report on its activities. Each such report shall include a status report on the development and adoption process as well as projections for continued progress in that process. Such report may include recommendations covering any legislation or executive action needed to implement the programs of development and adoption accepted by the Board. The Board may also submit such other reports and recommendations as it deems necessary;and

(11) submit to the President, not later than 1 year after the date of enactment of the Act making appropriations for carrying out this subchapter, a report on the need to provide an effective structural mechanism for adopting intangible assets metric units in statutes, regulations, and other laws at all levels of government, on a coordinated and timely basis, in response to voluntary programs adopted and implemented by various sectors of society under the auspices and with the approval of the Board. If the Board determines that such a need exists, such report shall include recommendations as to appropriate and effective means for establishing and implementing such a mechanism.

§101f. – Duties of Board. – In carrying out its duties under this subchapter, the Board may –

(1) establish an Executive Committee, and such other committees as it deems desirable;

(2) establish such committees and advisory panels as it deems necessary to work with the various sectors of the Nation’s economy and with Federal and State governmental agencies in the development and implementation of detailed development and adoption plans for those sectors. The Board may reimburse,to the extent authorized by law, the members of such committees;

(3) conduct hearings at such times and places as it deems appropriate;

(4) enter into contracts, in accordance with the Federal Property and Administrative Services Act of 1949, as amended (40 U.S.C. 471et seq.), with Federal or State agencies, private firms, institutions, and individuals for the conduct of research or surveys, the preparation of reports, and other activities necessary to the discharge of its duties;

(5) delegate to the Executive Director such authority as it deems advisable; and

(6) perform such other acts as may be necessary to carry out the duties prescribed by this subchapter.

§101g. – Gifts, donations and bequests to Board

(a) Authorization; deposit into Treasury and disbursement. – The Board may accept, hold, administer, and utilize gifts, donations,and bequests of property, both real and personal, and personal services, for the purpose of aiding or facilitating the work of the Board. Gifts and bequests of money, and the proceeds from the sale of any other property received as gifts or requests, shall be deposited in the Treasury in a separate fund and shall be disbursed upon order of the Board.

(b) Federal income, estate, and gift taxation of property. – For purpose of Federal income, estate, and gift taxation, property accepted under subsection (a) of this section shall be considered as a gift or bequest to or for the use of the United States.

(c) Investment of moneys; disbursement of accrued income. – Upon the request of the Board, the Secretary of the Treasury may invest and reinvest, in securities of the United States, any moneys contained in the fund authorized in subsection (a) of this section. Income accruing from such securities, and from any other property acceptedto the credit of such fund, shall be dispersed upon the order ofthe Board.

(d) Reversion to Treasury of unexpended funds. – Funds not expended by the Board as of the date when it ceases to exist, in accordance with section 105d(d) of this title, shall revert to the Treasury of the United States as of such date.

§101h. – Compensation of Board members; travel expenses.- Members of the Board who are not in the regular full-time employ of the United States shall, while attending meetings or conferences of the Board or while otherwise engaged in the business of the Board, be entitled to receive compensation at a rate not to exceed the daily rate currently being paid grade 18 of the General Schedule (under section 5332 of title 5), including travel time. While so serving, on the business of the Board away from their homes or regular places of business, members of the Board may be allowed travel expenses,including per diem in lieu of subsistence, as authorized by section5703 of title 5, for persons employed intermittently in the Government service. Payments under this section shall not render members of the Board employees or of the United States for any purpose. Members of the Board who are in the employ of the United States shall be entitled to travel expenses when traveling on the business of the Board.

§101i. – Personnel

(a) Executive Director; appointment; tenure; duties. – The Board shall appoint a qualified individual to serve as the Executive Director of the Board at the pleasure of the Board. The Executive Director, subject to the direction of the Board, shall be responsible to the Board and shall carry out the intangible assets metric development and adoption program, pursuant to the provisions of this subchapter and the policies established by the Board.

(b) Executive Director; salary. – The Executive Director of the Board shall serve full time and be subject to the provisions of chapter 51 and subchapter III of chapter 53 of title 5. The annual salary of the Executive Director shall not exceed level III of the Executive Schedule under section 5314 of such title.

(c) Staff personnel; appointment and compensation. – The Board may appoint and fix the compensation of such staff personnel as may be necessary to carry out the provisions of this subchapter in accordance with the provisions of chapter 51 and subchapter III of chapter 53 of title 5.

(d) Experts and consultants; employment and compensation; annual review of contracts. – The Board may (1) employ experts and consultants or organizations thereof, as authorized by section 3109 of title5; (2) compensate individuals so employed at rates not in excess of the rate currently being paid grade 18 of the General Schedule under section 5332 of such title, including travel time; and (3) may allow such individuals, while away from their homes or regular places of business, travel expenses (including per diem in lieu of subsistence) as authorized by section 5703 of such title 5 for persons in the Government service employed intermittently: Provided, however, that contracts for such temporary employment may be renewed annually.

§101j. – Financial and administrative services; sourceand reimbursement. – Financial and administrative services, including those related to budgeting, accounting, financial reporting, personnel, and procurement, and such other staff services as maybe needed by the Board, may be obtained by the Board from the Secretary of Commerce or other appropriate sources in the Federal Government. Payment for such services shall be made by the Board, in advance or by reimbursement, from funds of the Board in such amounts as may be agreed upon by the Chairman of the Board and by the source of the services being rendered.

§101k. – Authorization of appropriations; availability.- There are authorized to be appropriated such sums as may be necessary to carry out the provisions of this subchapter. Appropriations to carry out the provisions of this subchapter may remain available for obligation and expenditure for such period or periods as maybe specified in the Acts making such appropriations.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Reliability Coefficients: Starting from the Beginning

August 31, 2009

[This posting was prompted by questions concerning a previous blog entry, Reliability Revisited, and provides background on reliability that only Rasch measurement practitioners are likely to possess.] Most measurement applications based in ordinal data do not implement rigorous checks of the internal consistency of the observations, nor do they typically use the log-odds transformation to convert the nonlinear scores into linear measures. Measurement is usually defined in statistical terms, applying population-level models to obtain group-level summary scores, means, and percentages. Measurement, however, ought to involve individual-level models and case-specific location estimates. (See one of my earlier blogs for more on this distinction between statistics and measurement.)

Given the appropriate measurement focus on the individual, the instrument is initially calibrated and measures are estimated in a simultaneous conjoint process. Once the instrument is calibrated, the item estimates can be anchored, measures can be routinely produced from them, and new items can be calibrated into the system, and others dropped, over time. This method has been the norm in admissions, certification, licensure, and high stakes testing for decades (Fisher & Wright, 1994; Bezruczko, 2005).

Measurement modelling of individual response processes has to be stochastic, or else we run into the attenuation paradox (Engelhard, 1993, 1994). This is the situation in which a deterministic progression of observations from one end of the instrument to the other produces apparently error-free data strings that look like this (1 being a correct answer, a higher rating, or the presence of an attribute, and 0 being incorrect, a lower rating, or the absence of the attribute):

00000000000

10000000000

11000000000

11100000000

11110000000

11111000000

11111100000

11111111000

11111111100

11111111110

11111111111

In this situation, strings with all 0s and all 1s give no information useful for estimating measures (rows) or calibrations (columns). It is as though some of the people are shorter than the first unit on the ruler, and others are taller than the top unit. We don’t really have any way of knowing how short or tall they are, so their rows drop out. But eliminating the top and bottom rows makes the leftmost and rightmost columns all 0s and 1s, and eliminating them then gives new rows with all 0s and 1s, etc., until there’s no data left. (See my Revisiting Reliability blog for evaluations of five different probabilistically-structured data sets of this kind simulated to contrast various approaches to assessing reliability and internal consistency.)

The problem for estimation (Linacre, 1991, 1999, 2000) in data like those shown above is that the lack of informational overlaps between the columns, on the one hand, and between the rows, on the other, gives us no basis for knowing how much more of the variable is represented by any one item relative to any other, or by any one person measured relative to any other. In addition, whenever we actually construct measures of abilities, attitudes, or behaviors that conform with this kind of Guttman (1950) structure (Andrich, 1985; Douglas & Wright, 1989; Engelhard, 2008), the items have to be of such markedly different difficulties or agreeabilities that the results tend to involve large numbers of indistinguishable groups of respondents. But when that information is present in a probabilistically consistent way, we have an example of the phenomenon of stochastic resonance (Fisher, 1992b), so called because of the way noise amplifies weak deterministic signals (Andò & Graziani, 2000; Benzi, Sutera, & Vulpiani, 1981; Bulsara & Gammaitoni, 1996; Dykman & McClintock, 1998; Schimansky-Geier, Freund, Neiman, & Shulgin, 1998).

We need the noise, but we can’t let it overwhelm the system. We have to be able to know how much error there is relative to actual signal. Reliability is traditionally defined (Guilford 1965, pp. 439-40) as an estimate of this relation of signal and noise:

“The reliability of any set of measurements is logically defined as the proportion of their variance that is true variance…. We think of the total variance of a set of measures as being made up of two sources of variance: true variance and error variance… The true measure is assumed to be the genuine value of whatever is being measured… The error components occur independently and at random.”

Traditional reliability coefficients, like Cronbach’s alpha, are correlational, implementing a statistical model of group-level information. Error is taken to be the unexplained portion of the variance:

“In his description of alpha Cronbach (1951) proved (1) that alpha is the mean of all possible split-half coefficients, (2) that alpha is the value expected when two random samples of items from a pool like those in the given test are correlated, and (3) that alpha is a lower bound to the proportion of test variance attributable to common factors among the items” (Hattie, 1985, pp. 143-4).

But measurement models of individual-level response processes (Rasch, 1960; Andrich, 1988; Wright, 1977; Fisher & Wright, 1994; Bond & Fox, 2007; Wilson, 2005; Bezruczko, 2005) employ individual-level error estimates (Wright, 1977; Wright & Stone, 1979; Wright & Masters, 1982), not correlational group-level variance estimates. The individual measurement errors are statistically equivalent to sampling confidence intervals, as is evident in both Wright’s equations and in plots of errors and confidence intervals (see Figure 4 in Fisher, 2008). That is, error and confidence intervals both decline at the same rate with larger numbers of item responses per person, or larger numbers of person responses per item.

This phenomenon has a constructive application in instrument design. If a reasonable expectation for the measurement standard deviation can be formulated and related to the error expected on the basis of the number of items and response categories, a good estimate of the measurement reliability can be read off a nomograph (Linacre, 1993).

Wright (Wright & Masters, 1982, pp. 92, 106; Wright, 1996) introduced several vitally important measurement precision concepts and tools that follow from access to individual person and item error estimates. They improve on the traditional KR-20 or Cronbach reliability coefficients because the individualized error estimates better account for the imprecisions of mistargeted instruments, and for missing data, and so more accurately and conservatively estimate reliability.

Wright and Masters introduce a new reliability statistic, G, the measurement separation reliability index. The availability of individual error estimates makes it possible to estimate the true variance of the measures more directly, by subtracting the mean square error from the total variance. The standard deviation based on this estimate of true variance is then made the numerator of a ratio, G, having the root mean square error as its denominator.

Each unit increase in this G index then represents another multiple of the error unit in the amount of quantitative variation present in the measures. This multiple is nonlinearly represented in the traditional reliability coefficients expressed in the 0.00 – 1.00 range, such that the same separation index unit difference is found in the 0.00 to 0.50, 0.50 to 0.80, 0.80 to 0.90, 0.90 to 0.94, 0.94 to 0.96, and 0.96 to 0.97 reliability ranges (see Fisher, 1992a, for a table of values; available online: see references).

G can also be estimated as the square root of the reliability divided by one minus the reliability. Conversely, a reliability coefficient roughly equivalent to Cronbach’s alpha is estimated as G squared divided by G squared plus the error variance. Because individual error estimates are inflated in the presence of missing data and when an instrument is mistargeted and measures tend toward the extremes, the Rasch-based reliability coefficients tend to be more conservative than Cronbach’s alpha, as these sources of error are hidden within the variances and correlations. For a comparison of the G separation index, the G reliability coefficient, and Cronbach’s alpha over five simulated data sets, see the Reliability Revisited blog entry.

Error estimates can be made more conservative yet by multiplying each individual error term by the larger of either 1.0 or the square root of the associated individual mean square fit statistic for that case (Wright, 1995). (The mean square fit statistics are chi-squares divided by their degrees of freedom, and so have an expected value of 1.00; see Smith (1996) for more on fit, and see my recent blog, Revisiting Reliability, for more on the conceptualization and evaluation of reliability relative to fit.)

Wright and Masters (1982, pp. 92, 105-6) also introduce the concept of strata, ranges on the measurement continuum with centers separated by three errors. Strata are in effect a more forgiving expression of the separation reliability index, G, since the latter approximates strata with centers separated by four errors. An estimate of strata defined as having centers separated by four errors is very nearly identical with the separation index. If three errors define a 95% confidence interval, four are equivalent to 99% confidence.

There is a particular relevance in all of this for practical applications involving the combination or aggregation of physical, chemical, and other previously calibrated measures. This is illustrated in, for instance, the use of chemical indicators in assessing disease severity, environmental pollution, etc. Though any individual measure of the amount of a chemical or compound is valid within the limits of its intended purpose, to arrive at measures delineating disease severity, overall pollution levels, etc., the relevant instruments must be designed, tested, calibrated, and maintained, just as any instruments are (Alvarez, 2005; Cipriani, Fox, Khuder, et al., 2005; Fisher, Bernstein, et al., 2002; Fisher, Priest, Gilder, et al., 2008; Hughes, Perkins, Wright, et al., 2003; Perkins, Wright, & Dorsey, 2005; Wright, 2000).

The same methodology that is applied in this work, involving the rating or assessment of the quality of the outcomes or impacts counted, expressed as percentages, or given in an indicator’s native metric (parts per million, acres, number served, etc.), is needed in the management of all forms of human, social, and natural capital. (Watch this space for a forthcoming blog applying this methodology to the scaling of the UN Millennium Development Goals data.) The practical advantages of working from calibrated instrumentation in these contexts include data quality evaluations, the replacement of nonlinear percentages with linear measures, data volume reduction with no loss of information, and the integration of meaningful and substantive qualities with additive quantities on annotated metrics.

References

Alvarez, P. (2005). Several noncategorical measures define air pollution. In N. Bezruczko (Ed.), Rasch measurement in health sciences (pp. 277-93). Maple Grove, MN: JAM Press.

Andò, B., & Graziani, S. (2000). Stochastic resonance theory and applications. New York: Kluwer Academic Publishers.

Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. B. Tuma (Ed.), Sociological methodology 1985 (pp. 33-80). San Francisco, California: Jossey-Bass.

Andrich, D. (1988). Rasch models for measurement. (Vols. series no. 07-068). Sage University Paper Series on Quantitative Applications in the Social Sciences). Beverly Hills, California: Sage Publications.

Benzi, R., Sutera, A., & Vulpiani, A. (1981). The mechanism of stochastic resonance. Journal of Physics. A. Mathematical and General, 14, L453-L457.

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Bulsara, A. R., & Gammaitoni, L. (1996, March). Tuning in to noise. Physics Today, 49, 39-45.

Cipriani, D., Fox, C., Khuder, S., & Boudreau, N. (2005). Comparing Rasch analyses probability estimates to sensitivity, specificity and likelihood ratios when examining the utility of medical diagnostic tests. Journal of Applied Measurement, 6(2), 180-201.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.

Douglas, G. A., & Wright, B. D. (1989). Response patterns and their probabilities. Rasch Measurement Transactions, 3(4), 75-77 [http://www.rasch.org/rmt/rmt34.htm].

Dykman, M. I., & Mcclintock, P. V. E. (1998, January 22). What can stochastic resonance do? Nature, 391(6665), 344.

Engelhard, G., Jr. (1993). What is the attenuation paradox? Rasch Measurement Transactions, 6(4), 257 [http://www.rasch.org/rmt/rmt64.htm].

Engelhard, G., Jr. (1994). Resolving the attenuation paradox. Rasch Measurement Transactions, 8(3), 379.

Engelhard, G. (2008, July). Historical perspectives on invariant measurement: Guttman, Rasch, and Mokken. Measurement: Interdisciplinary Research & Perspectives, 6(3), 155-189.

Fisher, W. P., Jr. (1992a). Reliability statistics. Rasch Measurement Transactions, 6(3), 238 [http://www.rasch.org/rmt/rmt63i.htm].

Fisher, W. P., Jr. (1992b, Spring). Stochastic resonance and Rasch measurement. Rasch Measurement Transactions, 5(4), 186-187 [http://www.rasch.org/rmt/rmt54k.htm].

Fisher, W. P., Jr. (2008, Summer). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-3 [http://www.rasch.org/rmt/rmt221.pdf].

Fisher, W. P., Jr., Bernstein, L. H., Qamar, A., Babb, J., Rypka, E. W., & Yasick, D. (2002, February). At the bedside: Measuring patient outcomes. Advance for Administrators of the Laboratory, 11(2), 8, 10 [http://laboratory-manager.advanceweb.com/Article/At-the-Bedside-7.aspx].

Fisher, W. P., Jr., Priest, E., Gilder, R., Blankenship, D., & Burton, E. C. (2008, July 3-6). Development of a novel heart failure measure to identify hospitalized patients at risk for intensive care unit admission. Presented at the World Congress on Controversies in Cardiovascular Diseases [http://www.comtecmed.com/ccare/2008/authors_abstract.aspx#Author15], Intercontinental Hotel, Berlin, Germany.

Fisher, W. P., Jr., & Wright, B. D. (Eds.). (1994). Applications of probabilistic conjoint measurement. International Journal of Educational Research, 21(6), 557-664.

Guilford, J. P. (1965). Fundamental statistics in psychology and education. 4th Edn. New York: McGraw-Hill.

Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stouffer & et al. (Eds.), Studies in social psychology in World War II. volume 4: Measurement and prediction (pp. 60-90). New York: Wiley.

Hattie, J. (1985, June). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139-64.

Hughes, L., Perkins, K., Wright, B. D., & Westrick, H. (2003). Using a Rasch scale to characterize the clinical features of patients with a clinical diagnosis of uncertain, probable or possible Alzheimer disease at intake. Journal of Alzheimer’s Disease, 5(5), 367-373.

Linacre, J. M. (1991, Spring). Stochastic Guttman order. Rasch Measurement Transactions, 5(4), 189 [http://www.rasch.org/rmt/rmt54p.htm].

Linacre, J. M. (1993). Rasch-based generalizability theory. Rasch Measurement Transactions, 7(1), 283-284; [http://www.rasch.org/rmt/rmt71h.htm].

Linacre, J. M. (1999). Understanding Rasch measurement: Estimation methods for Rasch measures. Journal of Outcome Measurement, 3(4), 382-405.

Linacre, J. M. (2000, Autumn). Guttman coefficients and Rasch data. Rasch Measurement Transactions, 14(2), 746-7 [http://www.rasch.org/rmt/rmt142e.htm].

Perkins, K., Wright, B. D., & Dorsey, J. K. (2005). Using Rasch measurement with medical data. In N. Bezruczko (Ed.), Rasch measurement in health sciences (pp. 221-34). Maple Grove, MN: JAM Press.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Schimansky-Geier, L., Freund, J. A., Neiman, A. B., & Shulgin, B. (1998). Noise induced order: Stochastic resonance. International Journal of Bifurcation and Chaos, 8(5), 869-79.

Smith, R. M. (2000). Fit analysis in latent trait measurement models. Journal of Applied Measurement, 1(2), 199-218.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D. (1995, Summer). Which standard error? Rasch Measurement Transactions, 9(2), 436-437 [http://www.rasch.org/rmt/rmt92n.htm].

Wright, B. D. (1996, Winter). Reliability and separation. Rasch Measurement Transactions, 9(4), 472 [http://www.rasch.org/rmt/rmt94n.htm].

Wright, B. D. (2000). Rasch regression: My recipe. Rasch Measurement Transactions, 14(3), 758-9 [http://www.rasch.org/rmt/rmt143u.htm].

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1979). Best test design: Rasch measurement. Chicago, Illinois: MESA Press.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

A Tale of Two Industries: Contrasting Quality Assessment and Improvement Frameworks

July 8, 2009

Imagine the chaos that would result if industrial engineers each had their own tool sets calibrated in idiosyncratic metrics with unit sizes that changed depending on the size of what they measured, and they conducted quality improvement studies focusing on statistical significance tests of effect sizes. Furthermore, these engineers ignore the statistical power of their designs, and don’t know when they are finding statistically significant results by pure chance, and when they are not. And finally, they also ignore the substantive meaning of the numbers, so that they never consider the differences they’re studying in terms of varying probabilities of response to the questions they ask.

So when one engineer tries to generalize a result across applications, what happens is that it kind of works sometimes, doesn’t work at all other times, is often ignored, and does not command a compelling response from anyone because they are invested in their own metrics, samples, and results, which are different from everyone else’s. If there is any discussion of the relative merits of the research done, it is easy to fall into acrimonious and heated arguments that cannot be resolved because of the lack of consensus on what constitutes valid data, instrumentation, and theory.

Thus, the engineers put up the appearance of polite decorum. They smile and nod at each other’s local, sample-dependent, and irreproducible results, while they build mini-empires of funding, students, quoting circles, and professional associations on the basis of their personal authority and charisma. As they do so, costs in their industry go spiralling out of control, profits are almost nonexistent, fewer and fewer people can afford their products, smart people are going into other fields, and overall product quality is declining.

Of course, this is the state of affairs in education and health care, not in industrial engineering. In the latter field, the situation is much different. Here, everyone everywhere is very concerned to be sure they are always measuring the same thing as everyone else and in the same unit. Unexpected results of individual measures pop out instantly and are immediately redone. Innovations are more easily generated and disseminated because everyone is thinking together in the same language and seeing effects expressed in the same images. Someone else’s ideas and results can be easily fitted into anyone else’s experience, and the viability of a new way of doing things can be evaluated on the basis of one’s own experience and skills.

Arguments can be quite productive, as consensus on basic values drives the demand for evidence. Associations and successes are defined more in terms of merit earned from productivity and creativity demonstrated through the accumulation of generalized results. Costs in these industries are constantly dropping, profits are steady or increasing, more and more people can afford their products, smart people are coming into the field, and overall product quality is improving.

There is absolutely no reason why education and health care cannot thrive and grow like other industries. It is up to us to show how.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Publications Documenting Score, Rating, Percentage Contrasts with Real Measures

July 7, 2009

A few brief and easy introductions to the contrast between scores, ratings, and percentages vs measures include:

Linacre, J. M. (1992, Autumn). Why fuss about statistical sufficiency? Rasch Measurement Transactions, 6(3), 230 [http://www.rasch.org/rmt/rmt63c.htm].

Linacre, J. M. (1994, Summer). Likert or Rasch? Rasch Measurement Transactions, 8(2), 356 [http://www.rasch.org/rmt/rmt82d.htm].

Wright, B. D. (1992, Summer). Scores are not measures. Rasch Measurement Transactions, 6(1), 208 [http://www.rasch.org/rmt/rmt61n.htm].

Wright, B. D. (1989). Rasch model from counting right answers: Raw scores as sufficient statistics. Rasch Measurement Transactions, 3(2), 62 [http://www.rasch.org/rmt/rmt32e.htm].

Wright, B. D. (1993). Thinking with raw scores. Rasch Measurement Transactions, 7(2), 299-300 [http://www.rasch.org/rmt/rmt72r.htm].

Wright, B. D. (1999). Common sense for measurement. Rasch Measurement Transactions, 13(3), 704-5  [http://www.rasch.org/rmt/rmt133h.htm].

Longer and more technical comparisons include:

Andrich, D. (1989). Distinctions between assumptions and requirements in measurement in the social sciences. In J. A. Keats, R. Taft, R. A. Heath & S. H. Lovibond (Eds.), Mathematical and Theoretical Systems: Proceedings of the 24th International Congress of Psychology of the International Union of Psychological Science, Vol. 4 (pp. 7-16). North-Holland: Elsevier Science Publishers.

van Alphen, A., Halfens, R., Hasman, A., & Imbos, T. (1994). Likert or Rasch? Nothing is more applicable than good theory. Journal of Advanced Nursing, 20, 196-201.

Wright, B. D., & Linacre, J. M. (1989). Observations are always ordinal; measurements, however, must be interval. Archives of Physical Medicine and Rehabilitation, 70(12), 857-867 [http://www.rasch.org/memo44.htm].

Zhu, W. (1996). Should total scores from a rating scale be used directly? Research Quarterly for Exercise and Sport, 67(3), 363-372.

The following lists provide some key resources. The lists are intended to be representative, not comprehensive.  There are many works in addition to these that document the claims in yesterday’s table. Many of these books and articles are highly technical.  Good introductions can be found in Bezruczko (2005), Bond and Fox (2007), Smith and Smith (2004), Wilson (2005), Wright and Stone (1979), Wright and Masters (1982), Wright and Linacre (1989), and elsewhere. The www.rasch.org web site has comprehensive and current information on seminars, consultants, software, full text articles, professional association meetings, etc.

Books and Journal Issues

Andrich, D. (1988). Rasch models for measurement. Sage University Paper Series on Quantitative Applications in the Social Sciences, vol. series no. 07-068. Beverly Hills, California: Sage Publications.

Andrich, D., & Douglas, G. A. (Eds.). (1982). Rasch models for measurement in educational and psychological research [Special issue]. Education Research and Perspectives, 9(1), 5-118. [Full text available at www.rasch.org.]

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Choppin, B. (1985). In Memoriam: Bruce Choppin (T. N. Postlethwaite ed.) [Special issue]. Evaluation in Education: An International Review Series, 9(1).

DeBoeck, P., & Wilson, M. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. Statistics for Social and Behavioral Sciences). New York: Springer-Verlag.

Embretson, S. E., & Hershberger, S. L. (Eds.). (1999). The new rules of measurement: What every psychologist and educator should know. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Engelhard, G., Jr., & Wilson, M. (1996). Objective measurement: Theory into practice, Vol. 3. Norwood, New Jersey: Ablex.

Fischer, G. H., & Molenaar, I. (1995). Rasch models: Foundations, recent developments, and applications. New York: Springer-Verlag.

Fisher, W. P., Jr., & Wright, B. D. (Eds.). (1994). Applications of Probabilistic Conjoint Measurement [Special Issue]. International Journal of Educational Research, 21(6), 557-664.

Garner, M., Draney, K., Wilson, M., Engelhard, G., Jr., & Fisher, W. P., Jr. (Eds.). (2009). Advances in Rasch measurement, Vol. One. Maple Grove, MN: JAM Press.

Granger, C. V., & Gresham, G. E. (Eds). (1993, August). New Developments in Functional Assessment [Special Issue]. Physical Medicine and Rehabilitation Clinics of North America, 4(3), 417-611.

Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago, Illinois: MESA Press.

Liu, X., & Boone, W. (2006). Applications of Rasch measurement in science education. Maple Grove, MN: JAM Press.

Masters, G. N. (2007). Special issue: Programme for International Student Assessment (PISA). Journal of Applied Measurement, 8(3), 235-335.

Masters, G. N., & Keeves, J. P. (Eds.). (1999). Advances in measurement in educational research and assessment. New York: Pergamon.

Osborne, J. W. (Ed.). (2007). Best practices in quantitative methods. Thousand Oaks, CA: Sage.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Smith, E. V., Jr., & Smith, R. M. (Eds.) (2004). Introduction to Rasch measurement. Maple Grove, MN: JAM Press.

Smith, E. V., Jr., & Smith, R. M. (2007). Rasch measurement: Advanced and specialized applications. Maple Grove, MN: JAM Press.

Smith, R. M. (Ed.). (1997, June). Outcome Measurement [Special Issue]. Physical Medicine & Rehabilitation State of the Art Reviews, 11(2), 261-428.

Smith, R. M. (1999). Rasch measurement models. Maple Grove, MN: JAM Press.

von Davier, M. (2006). Multivariate and mixture distribution Rasch models. New York: Springer.

Wilson, M. (1992). Objective measurement: Theory into practice, Vol. 1. Norwood, New Jersey: Ablex.

Wilson, M. (1994). Objective measurement: Theory into practice, Vol. 2. Norwood, New Jersey: Ablex.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wilson, M., Draney, K., Brown, N., & Duckor, B. (Eds.). (2009). Advances in Rasch measurement, Vol. Two (p. in press). Maple Grove, MN: JAM Press.

Wilson, M., & Engelhard, G. (2000). Objective measurement: Theory into practice, Vol. 5. Westport, Connecticut: Ablex Publishing.

Wilson, M., Engelhard, G., & Draney, K. (Eds.). (1997). Objective measurement: Theory into practice, Vol. 4. Norwood, New Jersey: Ablex.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1979). Best test design: Rasch measurement. Chicago, Illinois: MESA Press.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/memos.htm#measess].

Key Articles

Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42(1), 69-81.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-73.

Andrich, D. (2002). Understanding resistance to the data-model relationship in Rasch’s paradigm: A reflection for the next generation. Journal of Applied Measurement, 3(3), 325-59.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Beltyukova, S. A., Stone, G. E., & Fox, C. M. (2008). Magnitude estimation and categorical rating scaling in social sciences: A theoretical and psychometric controversy. Journal of Applied Measurement, 9(2), 151-159.

Choppin, B. (1968). An item bank using sample-free calibration. Nature, 219, 870-872.

Embretson, S. E. (1996, September). Item Response Theory models and spurious interaction effects in factorial ANOVA designs. Applied Psychological Measurement, 20(3), 201-212.

Engelhard, G. (2008, July). Historical perspectives on invariant measurement: Guttman, Rasch, and Mokken. Measurement: Interdisciplinary Research & Perspectives, 6(3), 155-189.

Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.

Fischer, G. H. (1981, March). On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika, 46(1), 59-77.

Fischer, G. H. (1989). Applying the principles of specific objectivity and of generalizability to the measurement of change. Psychometrika, 52(4), 565-587.

Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.

Fisher, W. P., Jr. (2009, July). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), in press.

Grosse, M. E., & Wright, B. D. (1986, Sep). Setting, evaluating, and maintaining certification standards with the Rasch model. Evaluation & the Health Professions, 9(3), 267-285.

Hall, W. J., Wijsman, R. A., & Ghosh, J. K. (1965). The relationship between sufficiency and invariance with applications in sequential analysis. Annals of Mathematical Statistics, 36, 575-614.

Kamata, A. (2001, March). Item analysis by the Hierarchical Generalized Linear Model. Journal of Educational Measurement, 38(1), 79-93.

Karabatsos, G., & Ullrich, J. R. (2002). Enumerating and testing conjoint measurement models. Mathematical Social Sciences, 43, 487-505.

Linacre, J. M. (1997). Instantaneous measurement and diagnosis. Physical Medicine and Rehabilitation State of the Art Reviews, 11(2), 315-324.

Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1), 85-106.

Lunz, M. E., & Bergstrom, B. A. (1991). Comparability of decision for computer adaptive and written examinations. Journal of Allied Health, 20(1), 15-23.

Lunz, M. E., Wright, B. D., & Linacre, J. M. (1990). Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3/4, 331-345.

Masters, G. N. (1985, March). Common-person equating with the Rasch model. Applied Psychological Measurement, 9(1), 73-82.

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1(1), 3-62.

Pelton, T., & Bunderson, V. (2003). The recovery of the density scale using a stochastic quasi-realization of additive conjoint measurement. Journal of Applied Measurement, 4(3), 269-81.

Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (pp. 321-333 [http://www.rasch.org/memo1960.pdf]). Berkeley, California: University of California Press.

Rasch, G. (1966). An individualistic approach to item analysis. In P. F. Lazarsfeld & N. W. Henry (Eds.), Readings in mathematical social science (pp. 89-108). Chicago, Illinois: Science Research Associates.

Rasch, G. (1966, July). An informal report on the present state of a theory of objectivity in comparisons. Unpublished paper [http://www.rasch.org/memo1966.pdf].

Rasch, G. (1966). An item analysis which takes individual differences into account. British Journal of Mathematical and Statistical Psychology, 19, 49-57.

Rasch, G. (1968, September 6). A mathematical theory of objectivity and its consequences for model construction. [Unpublished paper [http://www.rasch.org/memo1968.pdf]], Amsterdam, the Netherlands: Institute of Mathematical Statistics, European Branch.

Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14, 58-94.

Romanoski, J. T., & Douglas, G. (2002). Rasch-transformed raw scores and two-way ANOVA: A simulation analysis. Journal of Applied Measurement, 3(4), 421-430.

Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3(1), 25-40.

Smith, R. M. (2000). Fit analysis in latent trait measurement models. Journal of Applied Measurement, 1(2), 199-218.

Stenner, A. J., & Smith III, M. (1982). Testing construct theories. Perceptual and Motor Skills, 55, 415-426.

Stenner, A. J. (1994). Specific objectivity – local and general. Rasch Measurement Transactions, 8(3), 374 [http://www.rasch.org/rmt/rmt83e.htm].

Stone, G. E., Beltyukova, S. A., & Fox, C. M. (2008). Objective standard setting for judge-mediated examinations. International Journal of Testing, 8(2), 180-196.

Stone, M. H. (2003). Substantive scale construction. Journal of Applied Measurement, 4(3), 282-97.

Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13(2), 181-208.

Wright, B. D. (1968). Sample-free test calibration and person measurement. In Proceedings of the 1967 invitational conference on testing problems (pp. 85-101 [http://www.rasch.org/memo1.htm]). Princeton, New Jersey: Educational Testing Service.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D. (1980). Foreword, Afterword. In Probabilistic models for some intelligence and attainment tests, by Georg Rasch (pp. ix-xix, 185-199. http://www.rasch.org/memo63.htm). Chicago, Illinois: University of Chicago Press.

Wright, B. D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3(1), 281-288 [http://www.rasch.org/memo41.htm].

Wright, B. D. (1985). Additivity in psychological measurement. In E. Roskam (Ed.), Measurement and personality assessment. North Holland: Elsevier Science Ltd.

Wright, B. D. (1996). Comparing Rasch measurement and factor analysis. Structural Equation Modeling, 3(1), 3-24.

Wright, B. D. (1997, June). Fundamental measurement for outcome evaluation. Physical Medicine & Rehabilitation State of the Art Reviews, 11(2), 261-88.

Wright, B. D. (1997, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D., & Bell, S. R. (1984, Winter). Item banks: What, why, how. Journal of Educational Measurement, 21(4), 331-345 [http://www.rasch.org/memo43.htm].

Wright, B. D., & Linacre, J. M. (1989). Observations are always ordinal; measurements, however, must be interval. Archives of Physical Medicine and Rehabilitation, 70(12), 857-867 [http://www.rasch.org/memo44.htm].

Wright, B. D., & Mok, M. (2000). Understanding Rasch measurement: Rasch models overview. Journal of Applied Measurement, 1(1), 83-106.

Model Applications

Adams, R. J., Wu, M. L., & Macaskill, G. (1997). Scaling methodology and procedures for the mathematics and science scales. In M. O. Martin & D. L. Kelly (Eds.), Third International Mathematics and Science Study Technical Report: Vol. 2: Implementation and Analysis – Primary and Middle School Years. Boston: Center for the Study of Testing, Evaluation, and Educational Policy.

Andrich, D., & Van Schoubroeck, L. (1989, May). The General Health Questionnaire: A psychometric analysis using latent trait theory. Psychological Medicine, 19(2), 469-485.

Beltyukova, S. A., Stone, G. E., & Fox, C. M. (2004). Equating student satisfaction measures. Journal of Applied Measurement, 5(1), 62-9.

Bergstrom, B. A., & Lunz, M. E. (1999). CAT for certification and licensure. In F. Drasgow & J. B. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 67-91). Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc., Publishers.

Bond, T. G. (1994). Piaget and measurement II: Empirical validation of the Piagetian model. Archives de Psychologie, 63, 155-185.

Bunderson, C. V., & Newby, V. A. (2009). The relationships among design experiments, invariant measurement scales, and domain theories. Journal of Applied Measurement, 10(2), 117-137.

Cavanagh, R. F., & Romanoski, J. T. (2006, October). Rating scale instruments and measurement. Learning Environments Research, 9(3), 273-289.

Cipriani, D., Fox, C., Khuder, S., & Boudreau, N. (2005). Comparing Rasch analyses probability estimates to sensitivity, specificity and likelihood ratios when examining the utility of medical diagnostic tests. Journal of Applied Measurement, 6(2), 180-201.

Dawson, T. L. (2004, April). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11(2), 71-85.

DeSalvo, K., Fisher, W. P. Jr., Tran, K., Bloser, N., Merrill, W., & Peabody, J. W. (2006, March). Assessing measurement properties of two single-item general health measures. Quality of Life Research, 15(2), 191-201.

Engelhard, G., Jr. (1992). The measurement of writing ability with a many-faceted Rasch model. Applied Measurement in Education, 5(3), 171-191.

Engelhard, G., Jr. (1997). Constructing rater and task banks for performance assessment. Journal of Outcome Measurement, 1(1), 19-33.

Fisher, W. P., Jr. (1998). A research program for accountable and patient-centered health status measures. Journal of Outcome Measurement, 2(3), 222-239.

Fisher, W. P., Jr., Harvey, R. F., Taylor, P., Kilgore, K. M., & Kelly, C. K. (1995, February). Rehabits: A common language of functional assessment. Archives of Physical Medicine and Rehabilitation, 76(2), 113-122.

Heinemann, A. W., Gershon, R., & Fisher, W. P., Jr. (2006). Development and application of the Orthotics and Prosthetics User Survey: Applications and opportunities for health care quality improvement. Journal of Prosthetics and Orthotics, 18(1), 80-85 [http://www.oandp.org/jpo/library/2006_01S_080.asp].

Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., & Granger, C. V. (1994). Prediction of rehabilitation outcomes with disability measures. Archives of Physical Medicine and Rehabilitation, 75(2), 133-143.

Hobart, J. C., Cano, S. J., O’Connor, R. J., Kinos, S., Heinzlef, O., Roullet, E. P., C., et al. (2003). Multiple Sclerosis Impact Scale-29 (MSIS-29):  Measurement stability across eight European countries. Multiple Sclerosis, 9, S23.

Hobart, J. C., Cano, S. J., Zajicek, J. P., & Thompson, A. J. (2007, December). Rating scales as outcome measures for clinical trials in neurology: Problems, solutions, and recommendations. Lancet Neurology, 6, 1094-1105.

Lai, J., Fisher, A., Magalhaes, L., & Bundy, A. C. (1996). Construct validity of the sensory integration and praxis tests. Occupational Therapy Journal of Research, 16(2), 75-97.

Lee, N. P., & Fisher, W. P., Jr. (2005). Evaluation of the Diabetes Self Care Scale. Journal of Applied Measurement, 6(4), 366-81.

Ludlow, L. H., & Haley, S. M. (1995, December). Rasch model logits: Interpretation, use, and transformation. Educational and Psychological Measurement, 55(6), 967-975.

Markward, N. J., & Fisher, W. P., Jr. (2004). Calibrating the genome. Journal of Applied Measurement, 5(2), 129-41.

Massof, R. W. (2007, August). An interval-scaled scoring algorithm for visual function questionnaires. Optometry & Vision Science, 84(8), E690-E705.

Massof, R. W. (2008, July-August). Editorial: Moving toward scientific measurements of quality of life. Ophthalmic Epidemiology, 15, 209-211.

Masters, G. N., Adams, R. J., & Lokan, J. (1994). Mapping student achievement. International Journal of Educational Research, 21(6), 595-610.

Mead, R. J. (2009). The ISR: Intelligent Student Reports. Journal of Applied Measurement, 10(2), 208-224.

Pelton, T., & Bunderson, V. (2003). The recovery of the density scale using a stochastic quasi-realization of additive conjoint measurement. Journal of Applied Measurement, 4(3), 269-81.

Smith, E. V., Jr. (2000). Metric development and score reporting in Rasch measurement. Journal of Applied Measurement, 1(3), 303-26.

Smith, R. M., & Taylor, P. (2004). Equating rehabilitation outcome scales: Developing common metrics. Journal of Applied Measurement, 5(3), 229-42.

Solloway, S., & Fisher, W. P., Jr. (2007). Mindfulness in measurement: Reconsidering the measurable in mindfulness. International Journal of Transpersonal Studies, 26, 58-81 [http://www.transpersonalstudies.org/volume_26_2007.html].

Stenner, A. J. (2001). The Lexile Framework: A common metric for matching readers and texts. California School Library Journal, 25(1), 41-2.

Wolfe, E. W., Ray, L. M., & Harris, D. C. (2004, October). A Rasch analysis of three measures of teacher perception generated from the School and Staffing Survey. Educational and Psychological Measurement, 64(5), 842-860.

Wolfe, F., Hawley, D., Goldenberg, D., Russell, I., Buskila, D., & Neumann, L. (2000, Aug). The assessment of functional impairment in fibromyalgia (FM): Rasch analyses of 5 functional scales and the development of the FM Health Assessment Questionnaire. Journal of Rheumatology, 27(8), 1989-99.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

W

endt, A., & Tatum, D. S. (2005). Credentialing health care professionals. In N. Bezruczko (Ed.), Rasch measurement in health sciences (pp. 161-75). Maple Grove, MN: JAM Press.