Archive for the ‘Adaptive instrument administration’ Category

Review of “Advancing Social Impact Investments Through Measurement”

August 24, 2012

Over the last few days, I have been reading several of the most recent issues of the Community Development Investment Review, especially volume 7, number 2, edited by David Erickson of the Federal Reserve Bank of San Francisco, reporting the proceedings of the March 21, 2011 conference in Washington, DC on advancing social impact investments through measurement. I am so excited to see this work that I am (truly) fairly trembling with excitement. I feel as though I’ve finally made my way home. There are so many points of contact, it’s hard to know where to start. After several days of concentrated deep breathing and close study of the CDIR, it’s now possible to formulate some coherent thoughts to share.

The CDIR papers start to sort out the complex issues involved in clarifying how measurement might contribute to the integration of impact investing and community development finance. I am heartened by the statement that “The goal of the Review is to bridge the gap between theory and practice and to enlist as many viewpoints as possible—government, nonprofits, financial institutions, and beneficiaries.” On the other hand, the omission of measurement scientists from that list of viewpoints adds another question to my long list of questions as to why measurement science is so routinely ignored by the very people who proclaim its importance. The situation is quite analogous to demanding more frequent conversational interactions from colleagues while ignoring the invention of the telephone and not providing them with the tools and network connections.

The aims shared by the CDIR contributors and myself are evident in the fact that David Erickson opens his summary of the March 21, 2011 conference with the same quote from Robert Kennedy that I placed at the end of my 2009 article in Measurement (see references below; all papers referenced are available by request if they are not already online). In that 2009 paper, in others I’ve published over the last several years, in presentations I’ve made to my measurement colleagues abroad and at home, and in various entries in my blog, I take up virtually all of the major themes that arose in the DC conference: how better measurement can attract capital to needed areas, how the cost of measurement repels many investors, how government can help by means of standard setting and regulation, how diverse and ambiguous investor and stakeholder interests can be reconciled and/or clarified, etc.

The difference, of course, is that I present these issues from the technical perspective of measurement and cannot speak authoritatively or specifically from the perspectives represented by the community development finance and impact investing fields. The bottom line take-away message for these fields from my perspective is this: unexamined assumptions may unnecessarily restrict assessments of problems and their potential solutions. As Salamon put it in his remarks in the CDIR proceedings from the Washington meeting (p. 43), “uncoordinated innovation not guided by a clear strategic concept can do more than lose its way: it can do actual harm.”

A clear strategic concept capable of coordinating innovations in social impact measurement is readily available. Multiple, highly valuable, and eminently practical measurement technologies have proven themselves in real world applications over the last 50 years. These technologies are well documented in the educational, psychological, sociological, and health care research literatures, as well as in the practical experience of high stakes testing for professional licensure and certification, for graduation, and for admissions.

Numerous reports show how to approach problems of quantification and standards with new degrees of rigor, transparency, meaningfulness, and flexibility. When measurement problems are not defined in terms of these technologies, solutions that may offer highly advantageous features are not considered. When the area of application is as far reaching and fundamental as social impact measurement, not taking new technologies into account is nothing short of tragic. I describe some of the new opportunities for you in a Technical Postscript, below.

In his Foreword to the CDIR proceedings issue, John Moon mentions having been at the 2009 SoCap event bringing together stakeholders from across the various social capital markets arenas. I was at the 2008 SoCap, and I came away from it with much the same impression as Moon, feeling that the palpable excitement in the air was more than tempered by the evident fact that people were often speaking at cross purposes, and that there did not seem to be a common object to the conversation. Moon, Erickson, and their colleagues have been in one position to sort out the issues involved, and I have been in another, but we are plainly on converging courses.

Though the science is in place and has been for decades, it will not and cannot amount to anything until the people who can best make use of it do so. The community development finance and impact investing fields are those people. Anyone interested in getting together for an informal conversation on topics of mutual interest should feel free to contact me.

Technical Postscript

There are at least six areas in efforts to advance social impact investments via measurement that will be most affected by contemporary methods. The first has to do with scale quality. I won’t go into the technical details, but numbers do not automatically stand for something that adds up the way they do. Mapping a substantive construct onto a number line requires specific technical expertise; there is no evidence of that expertise in any of the literature I’ve seen on social impact investing, or on measuring intangible assets. This is not an arbitrary bit of philosophical esoterica or technical nicety. This is one of those areas where the practical value of scientific rigor and precision comes into its own. It makes all the difference in being able to realize goals for measurement, investment, and redefining profit in terms of social impacts.

A second area in which thinking on social impact measurement will be profoundly altered by current scaling methods concerns the capacity to reduce data volume with no loss of information. In current systems, each indicator has its own separate metric. Data volume quickly multiplies when tracking separate organizations for each of several time periods in various locales. Given sufficient adherence to data quality and meaningfulness requirements, today’s scaling methods allow these indicators to be combined into a single composite measure—from which each individual observation can be inferred.

Elaborating this second point a bit further, I noted that some speakers at the 2011 conference in Washington thought reducing data volume is a matter of limiting the number of indicators that are tracked. This strategy is self-defeating, however, as having fewer independent observations increases uncertainty and risk. It would be far better to set up systems in which the metrics are designed so as to incorporate the amount of uncertainty that can be tolerated in any given decision support application.

The third area I have in mind deals with the diverse spectrum of varying interests and preferences brought to the table by investors, beneficiaries, and other stakeholders. Contemporary approaches in measurement make it possible to adapt the content of the particular indicators (counts or frequencies of events, or responses to survey questions or test items) to the needs of the user, without compromising the comparability of the resulting quantitative measure. This feature makes it possible to mass customize the content of the metrics employed depending on the substantive nature of the needs at that time and place.

Fourth, it is well known that different people judging performances or assigning numbers to observations bring different personal standards to bear as they make their ratings. Contemporary measurement methods enable the evaluation and scaling of raters and judges relative to one another, when data are gathered in a manner facilitating such comparisons. The end result is a basis for fair comparisons, instead of scores that vary depending more on which rater is observing than on the quality of the performance.

Fifth, much of the discussion at the conference in Washington last year emphasized the need for shared data formatting and reporting standards. As might be guessed from the prior four areas I’ve described, significant advances have occurred in standard setting methods. It is suggested in the CDIR proceedings that the Treasury Department should be the home to a new institute for social impact measurement standards. In a series of publications over the last few years, I have suggested a need for an Intangible Assets Metric System to NIST and NSF (see below for references and links; all papers are available on request). That suggestion comes up again in my third-prize winning entry in the 2011 World Standards Day paper competition, sponsored by NIST and SES (the Society for Standards Professionals), entitled “What the World Needs Now: A Bold Plan for New Standards.” (See below for link.)

Sixth, as noted by Salamon (p. 43), “metrics are not neutral. They not only measure impact, they can also shape it.” Though this is not likely exactly what Salamon meant, one of the most exciting areas in measurement applications in education in recent years, one led in many ways by my colleague, Mark Wilson, and his group at UC Berkeley, concerns exactly this feedback loop between measurement and impact. In education, it has become apparent that test scaling reveals the order in which lessons are learned. Difficult problems that require mastery of easier problems are necessarily answered correctly less often than the easier problems. When the difficulty order of test questions in a given subject remains constant over time and across thousands of students, one may infer that the scale reveals the path of least resistance. Individualizing instruction by targeting lessons at the student’s measure has given rise to a concept of formative assessment, distinct from the summative assessment of accountability applications. I suspect this kind of a distinction may also prove of value in social impact applications.

Relevant Publications and Presentations

Fisher, W. P., Jr. (2002, Spring). “The Mystery of Capital” and the human sciences. Rasch Measurement Transactions, 15(4), 854 [http://www.rasch.org/rmt/rmt154j.htm].

Fisher, W. P., Jr. (2004, Thursday, January 22). Bringing capital to life via measurement: A contribution to the new economics. In  R. Smith (Chair), Session 3.3B. Rasch Models in Economics and Marketing. Second International Conference on Measurement in Health, Education, Psychology, and Marketing: Developments with Rasch Models, The International Laboratory for Measurement in the Social Sciences, School of Education, Murdoch University, Perth, Western Australia.

Fisher, W. P., Jr. (2005, August 1-3). Data standards for living human, social, and natural capital. In Session G: Concluding Discussion, Future Plans, Policy, etc. Conference on Entrepreneurship and Human Rights [http://www.fordham.edu/economics/vinod/ehr05.htm], Pope Auditorium, Lowenstein Bldg, Fordham University.

Fisher, W. P., Jr. (2007, Summer). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-3 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2008, 3-5 September). New metrological horizons: Invariant reference standards for instruments measuring human, social, and natural capital. Presented at the 12th International Measurement Confederation (IMEKO) TC1-TC7 Joint Symposium on Man, Science, and Measurement, Annecy, France: University of Savoie.

Fisher, W. P., Jr. (2009, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P.. Jr. (2009). NIST Critical national need idea White Paper: Metrological infrastructure for human, social, and natural capital (Tech. Rep., http://www.nist.gov/tip/wp/pswp/upload/202_metrological_infrastructure_for_human_social_natural.pdf). Washington, DC: National Institute for Standards and Technology.

Fisher, W. P., Jr. (2010). The standard model in the history of the natural sciences, econometrics, and the social sciences. Journal of Physics: Conference Series, 238(1), http://iopscience.iop.org/1742-6596/238/1/012016/pdf/1742-6596_238_1_012016.pdf.

Fisher, W. P., Jr. (2011). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In N. Brown, B. Duckor, K. Draney & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 2 (pp. 1-27). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2011). Measuring genuine progress by scaling economic indicators to think global & act local: An example from the UN Millennium Development Goals project. LivingCapitalMetrics.com. Retrieved 18 January 2011, from Social Science Research Network: http://ssrn.com/abstract=1739386.

Fisher, W. P., Jr. (2012). Measure and manage: Intangible assets metric standards for sustainability. In J. Marques, S. Dhiman & S. Holt (Eds.), Business administration education: Changes in management and leadership strategies (pp. 43-63). New York: Palgrave Macmillan.

Fisher, W. P., Jr. (2012, May/June). What the world needs now: A bold plan for new standards. Standards Engineering, 64(3), 1 & 3-5 [http://ssrn.com/abstract=2083975].

Fisher, W. P., Jr., & Stenner, A. J. (2011, January). Metrology for the social, behavioral, and economic sciences (Social, Behavioral, and Economic Sciences White Paper Series). Retrieved 25 October 2011, from National Science Foundation: http://www.nsf.gov/sbe/sbe_2020/submission_detail.cfm?upld_id=36.

Fisher, W. P., Jr., & Stenner, A. J. (2011, August 31 to September 2). A technology roadmap for intangible assets metrology. In Fundamentals of measurement science. International Measurement Confederation (IMEKO) TC1-TC7-TC13 Joint Symposium, http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24493/ilm1-2011imeko-018.pdf, Jena, Germany.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Advertisements

Comments on the New ANSI Human Capital Investor Metrics Standard

April 16, 2012

The full text of the proposed standard is available here.

It’s good to see a document emerge in this area, especially one with such a broad base of support from a diverse range of stakeholders. As is stated in the standard, the metrics defined in it are a good place to start and in many instances will likely improve the quality and quantity of the information made available to investors.

There are several issues to keep in mind as the value of standards for human capital metrics becomes more widely appreciated. First, in the context of a comprehensively defined investment framework, human capital is just one of the four major forms of capital, the other three being social, natural, and manufactured (Ekins, 1992; Ekins, Dresden, and Dahlstrom, 2008). To ensure as far as possible the long term stability and sustainability of their profits, and of the economic system as a whole, investors will certainly want to expand the range of the available standards to include social and natural capital along with human capital.

Second, though we manage what we measure, investment management is seriously compromised by having high quality scientific measurement standards only for manufactured capital (length, weight, volume, temperature, energy, time, kilowatts, etc.). Over 80 years of research on ability tests, surveys, rating scales, and assessments has reached a place from which it is prepared to revolutionize the management of intangible forms of capital (Fisher, 2007, 2009a, 2009b, 2010, 2011a, 2011b; Fisher & Stenner, 2011a, 2011b; Wilson, 2011; Wright, 1999). The very large reductions in transaction costs effected by standardized metrics in the economy at large (Barzel, 1982; Benham and Benham, 2000) are likely to have a similarly profound effect on the economics of human, social, and natural capital (Fisher, 2011a, 2012a, 2012b).

The potential for dramatic change in the conceptualization of metrics is most evident in the proposed standard in the sections on leadership quality and employee engagement. For instance, in the section on leadership quality, it is stated that “Investors will be able to directly compare all organizations that are using the same vendor’s methodology.” This kind of dependency should not be allowed to stand as a significant factor in a measurement standard. Properly constructed and validated scientific measures, such as those that have been in wide use in education, psychology and health care for several decades (Andrich, 2010; Bezruzcko, 2005; Bond and Fox, 2007; Fisher and Wright, 1994; Rasch, 1960; Salzberger, 2009; Wright, 1999), are equated to a common unit. Comparability should never depend on which vendor is used. Rather, any instrument that actually measures the construct of interest (leadership quality or employee engagement) should do so in a common unit and within an acceptable range of error. “Normalizing” measures for comparability, as is suggested in the standard, means employing psychometric methods that are 50 years out of date and that are far less rigorous and practical than need be. Transparency in measurement means looking through the instrument to the thing itself. If particular instruments color or reshape what is measured, or merely change the meaning of the numbers reported, then the integrity of the standard as a standard should be re-examined.

Third, for investments in human capital to be effectively managed, each distinct aspect of it (motivations, skills and abilities, health) needs to be measured separately, just as height, weight, and temperature are. New technologies have already transformed measurement practices in ways that make the necessary processes precise and inexpensive. Of special interest are adaptively administered precalibrated instruments supporting mass customized—but globally comparable—measures (for instance, see the examples at http://blog.lexile.com/tag/oasis/ and that were presented at the recent Pearson Global Research Conference in Fremantle, Australia http://www.pearson.com.au/marketing/corporate/pearson_global/default.html; also see Wright and Bell 1984, Lunz, Bergstrom, and Gershon, 1994, Bejar, et al., 2003).

Fourth, the ownership of human capital needs clarification and legal status. If we consider each individual to own their abilities, health, and motivations, and to be solely responsible for decisions made concerning the disposition of those properties, then, in accord with their proven measured amounts of each type of human capital, everyone ought to have legal title to a specific number of shares or credits of each type. This may transform employment away from wage-based job classification compensation to an individualized investment-based continuous quality improvement platform. The same kind of legal titling system will, of course, need to be worked out for social and natural capital, as well.

Fifth, given scientific standards for each major form of capital, practical measurement technologies, and legal title to our shares of capital, we will need expanded financial accounting standards and tools for managing our individual and collective investments. Ongoing research and debates concerning these standards and tools (Siegel and Borgia, 2006; Young and Williams, 2010) have yet to connect with the larger scientific, economic, and legal issues raised here, but developments in this direction should be emerging in due course.

Sixth, a number of lingering moral, ethical and political questions are cast in a new light in this context. The significance of individual behaviors and decisions is informed and largely determined by the context of the culture and institutions in which those behaviors and decisions are executed. Many of the morally despicable but not illegal investment decisions leading to the recent economic downturn put individuals in the position of either setting themselves apart and threatening their careers or doing what was best for their portfolios within the limits of the law. Current efforts intended to devise new regulatory constraints are misguided in focusing on ever more microscopically defined particulars. What is needed is instead a system in which profits are contingent on the growth of human, social, and natural capital. In that framework, legal but ultimately unfair practices would drive down social capital stock values, counterbalancing ill-gotten gains and making them unprofitable.

Seventh, the International Vocabulary of Measurement, now in its third edition (VIM3), is a standard recognized by all eight international standards accrediting bodies (BIPM, etc.). The VIM3 (http://www.bipm.org/en/publications/guides/vim.html) and forthcoming VIM4 are intended to provide a uniform set of concepts and terms for all fields that employ measures across the natural and social sciences. A new dialogue on these issues has commenced in the context of the International Measurement Confederation (IMEKO), whose member organizations are the weights and standards measurement institutes from countries around the world (Conference note, 2011). The 2012 President of the Psychometric Society, Mark Wilson, gave an invited address at the September 2011 IMEKO meeting (Wilson, 2011), and a member of the VIM3 editorial board, Luca Mari, is invited to speak at the July, 2012 International Meeting of the Psychometric Society. I encourage all interested parties to become involved in efforts of these kinds in their own fields.

References

Andrich, D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika, 75(2), 292-308.

Barzel, Y. (1982). Measurement costs and the organization of markets. Journal of Law and Economics, 25, 27-48.

Bejar, I., Lawless, R. R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2003, November). A feasibility study of on-the-fly item generation in adaptive testing. The Journal of Technology, Learning, and Assessment, 2(3), 1-29; http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1663.

Benham, A., & Benham, L. (2000). Measuring the costs of exchange. In C. Ménard (Ed.), Institutions, contracts and organizations: Perspectives from new institutional economics (pp. 367-375). Cheltenham, UK: Edward Elgar.

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Conference note. (2011). IMEKO Symposium: August 31- September 2, 2011, Jena, Germany. Rasch Measurement Transactions, 25(1), 1318.

Ekins, P. (1992). A four-capital model of wealth creation. In P. Ekins & M. Max-Neef (Eds.), Real-life economics: Understanding wealth creation (pp. 147-155). London: Routledge.

Ekins, P., Dresner, S., & Dahlstrom, K. (2008). The four-capital method of sustainable development evaluation. European Environment, 18(2), 63-80.

Fisher, W. P., Jr. (2007). Living capital metrics. Rasch Measurement Transactions, 21(1), 1092-3 [http://www.rasch.org/rmt/rmt211.pdf].

Fisher, W. P., Jr. (2009a). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.

Fisher, W. P.. Jr. (2009b). NIST Critical national need idea White Paper: metrological infrastructure for human, social, and natural capital (http://www.nist.gov/tip/wp/pswp/upload/202_metrological_infrastructure_for_human_social_natural.pdf). Washington, DC: National Institute for Standards and Technology.

Fisher, W. P.. Jr. (2010). Rasch, Maxwell’s method of analogy, and the Chicago tradition. In G. Cooper (Chair), https://conference.cbs.dk/index.php/rasch/Rasch2010/paper/view/824. Probabilistic models for measurement in education, psychology, social science and health: Celebrating 50 years since the publication of Rasch’s Probabilistic Models.., University of Copenhagen School of Business, FUHU Conference Centre, Copenhagen, Denmark.

Fisher, W. P., Jr. (2011a). Bringing human, social, and natural capital to life: Practical consequences and opportunities. In N. Brown, B. Duckor, K. Draney & M. Wilson (Eds.), Advances in Rasch Measurement, Vol. 2 (pp. 1-27). Maple Grove, MN: JAM Press.

Fisher, W. P., Jr. (2011b). Measurement, metrology and the coordination of sociotechnical networks. In  S. Bercea (Chair), New Education and Training Methods. International Measurement Confederation (IMEKO), http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24491/ilm1-2011imeko-017.pdf, Jena, Germany.

Fisher, W. P., Jr. (2012a). Measure local, manage global: Intangible assets metric standards for sustainability. In J. Marques, S. Dhiman & S. Holt (Eds.), Business administration education: Changes in management and leadership strategies (pp. in press). New York: Palgrave Macmillan.

Fisher, W. P., Jr. (2012b). What the world needs now: A bold plan for new standards. Standards Engineering, 64, in press.

Fisher, W. P., Jr., & Stenner, A. J. (2011a). Metrology for the social, behavioral, and economic sciences (Social, Behavioral, and Economic Sciences White Paper Series). Retrieved 25 October 2011, from National Science Foundation: http://www.nsf.gov/sbe/sbe_2020/submission_detail.cfm?upld_id=36.

Fisher, W. P., Jr., & Stenner, A. J. (2011b). A technology roadmap for intangible assets metrology. In Fundamentals of measurement science. International Measurement Confederation (IMEKO) TC1-TC7-TC13 Joint Symposium, http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24493/ilm1-2011imeko-018.pdf, Jena, Germany.

Fisher, W. P., Jr., & Wright, B. D. (Eds.). (1994). Applications of probabilistic conjoint measurement. International Journal of Educational Research, 21(6), 557-664.

Lunz, M. E., Bergstrom, B. A., & Gershon, R. C. (1994). Computer adaptive testing. International Journal of Educational Research, 21(6), 623-634.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Salzberger, T. (2009). Measurement in marketing research: An alternative framework. Northampton, MA: Edward Elgar.

Siegel, P., & Borgia, C. (2006). The measurement and recognition of intangible assets. Journal of Business and Public Affairs, 1(1).

Wilson, M. (2011). The role of mathematical models in measurement: A perspective from psychometrics. In L. Mari (Chair), Plenary lecture. International Measurement Confederation (IMEKO), http://www.db-thueringen.de/servlets/DerivateServlet/Derivate-24178/ilm1-2011imeko-005.pdf, Jena, Germany.

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D., & Bell, S. R. (1984, Winter). Item banks: What, why, how. Journal of Educational Measurement, 21(4), 331-345 [http://www.rasch.org/memo43.htm].

Young, J. J., & Williams, P. F. (2010, August). Sorting and comparing: Standard-setting and “ethical” categories. Critical Perspectives on Accounting, 21(6), 509-521.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

A Framework for Competitive Advantage in Managing Intangible Assets

July 26, 2011

It has long been recognized that externalities like social costs could be brought into the market should ways of measuring them objectively be devised. Markets, however, do not emerge spontaneously from the mere desire to be able to buy and sell; they are, rather, the products of actors and agencies that define the rules, roles, and relationships within which transaction costs are reduced and from which value, profits, and authentic wealth may be extracted. Objective measurement is necessary to reduced transaction costs but is by itself insufficient to the making of markets. Thus, markets for intangible assets, such as human, social, and natural capital, remain inefficient and undeveloped even though scientific theories, models, methods, and results demonstrating their objective measurability have been available for over 80 years.

Why has the science of objectively measured intangible assets not yet led to efficient markets for those assets? The crux of the problem, the pivot point at which an economic Archimedes could move the world of business, has to do with verifiable trust. It may seem like stating the obvious, but there is much to be learned from recognizing that shared narratives of past performance and a shared vision of the future are essential to the atmosphere of trust and verifiability needed for the making of markets. The key factor is the level of detail reliably tapped by such narratives.

For instance, some markets seem to have the weight of an immovable mass when the dominant narrative describes a static past and future with no clearly defined trajectory of leverageable development. But when a path of increasing technical capacity or precision over time can be articulated, entrepreneurs have the time frames they need to be able to coordinate, align, and manage budgeting decisions vis a vis investments, suppliers, manufacturers, marketing, sales, and customers. For example, the building out of the infrastructure of highways, electrical power, and water and sewer services assured manufacturers of automobiles, appliances, and homes that they could develop products for which there would be ready customers. Similarly, the mapping out of a path of steady increases in technical precision at no additional cost in Moore’s Law has been a key factor enabling the microprocessor industry’s ongoing history of success.

Of course, as has been the theme of this blog since day one, similar paths for the development of new infrastructural capacities could be vital factors for making new markets for human, social, and natural capital. I’ll be speaking on this topic at the forthcoming IMEKO meeting in Jena, Germany, August 31 to September 2. Watch this spot for more on this theme in the near future.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Stages in the Development of Meaningful, Efficient, and Useful Measures

February 21, 2011

In all learning, we use what we already know as a means of identifying what we do not yet know. When someone can read a written language, knows an alphabet and has a vocabulary, understands grammar and syntax, then that knowledge can be used to learn about the world. Then, knowing what birds are, for instance, one might learn about different kinds of birds or the typical behaviors of one bird species.

And so with measurement, we start from where we find ourselves, as with anything else. There is no need or possibility for everyone to master all the technical details of every different area of life that’s important. But it is essential that we know what is technically possible, so that we can seek out and find the tools that help us achieve our goals. We can’t get what we can’t or don’t ask for. In the domain of measurement, it seems that hardly anyone is looking for what’s actually readily available.

So it seems pertinent to offer a description of a continuum of increasingly meaningful, efficient and useful ways of measuring. Previous considerations of the problem have offered different categorizations for the transformations characterizing development on this continuum. Stenner and Horabin (1992) distinguish between 1) impressionistic and qualitative, nominal gradations found in the earliest conceptualizations of temperature, 2) local, data-based quantitative measures of temperature, and 3) generalized, universally uniform, theory-based quantitative measures of temperature.

Theory-based temperature measurement is prized for the way that thermodynamic theory enables the calibration of individual thermometers with no need for testing each one in empirical studies of its performance. As Lewin (1951, p. 169) put it, “There is nothing so practical as a good theory.” Thus we have electromagnetic theory making it possible to know the conduction and resistance characteristics of electrical cable from the properties of the metal alloys and insulators used, with no need to test more than a small fraction of that cable as a quality check.

Theory makes it possible to know in advance what the results of such tests would be with enough precision to greatly reduce the burden and expenses of instrument calibration. There likely would be no electrical industry at all if the properties of every centimeter of cable and every appliance had to be experimentally tested. This principle has been employed in measuring human, social, and natural capital for some time, but, for a variety of reasons, it has not yet been adopted on a wide scale.

Reflecting on the history of psychosocial measurement in this context, it then becomes apparent that Stenner and Horabin’s (1992) three stages can then be further broken down. Listed below are the distinguishing features for each of six stages in the evolution of measurement systems, building on the five stages described by Stenner, Burdick, Sanford, and Burdick (2006). This progression of increasing complexity, meaning, efficiency, and utility can be used as a basis for a technology roadmap that will enable the coordination and alignment of various services and products in the domain of intangible assets, as I will take up in a forthcoming post.

Stage 1. Least meaning, utility, efficiency, and value

Purely passive, receptive

Statistics describe data: What you see is what you get

Content defines measure

Additivity, invariance, etc. not tested, so numbers do not stand for something that adds up like they do

Measurement defined statistically in terms of group-level intervariable relations

Meaning of numbers changes with questions asked and persons answering

No theory

Data must be gathered and analyzed to have results

Commercial applications are instrument-dependent

Standards based in ensuring fair methods and processes

Stage 2

Slightly less passive, receptive but still descriptively oriented

Additivity, invariance, etc. tested, so numbers might stand for something that adds up like they do

Measurement still defined statistically in terms of group-level intervariable relations

Falsification of additive hypothesis effectively derails measurement effort

Descriptive models with interaction effects accepted as viable alternatives

Typically little or no attention to theory of item hierarchy and construct definition

Empirical (data-based) calibrations only

Data must be gathered and analyzed to have results

Initial awareness of measurement theory

Commercial applications are instrument-dependent

Standards based in ensuring fair methods and processes

Stage 3

Even less purely passive & receptive, more active

Instrument still designed relative to content specifications

Additivity, invariance, etc. tested, so numbers might stand for something that adds up like they do

Falsification of additive hypothesis provokes questions as to why

Descriptive models with interaction effects not accepted as viable alternatives

Measurement defined prescriptively in terms of individual-level intravariable invariance

Significant attention to theory of item hierarchy and construct definition

Empirical calibrations only

Data has to be gathered and analyzed to have results

More significant use of measurement theory in prescribing acceptable data quality

Limited construct theory (no predictive power)

Commercial applications are instrument-dependent

Standards based in ensuring fair methods and processes

Stage 4

First stage that is more active than passive

Initial efforts to (re-)design instrument relative to construct specifications and theory

Additivity, invariance, etc. tested in thoroughly prescriptive focus on calibrating instrument

Numbers not accepted unless they stand for something that adds up like they do

Falsification of additive hypothesis provokes questions as to why and corrective action

Models with interaction effects not accepted as viable alternatives

Measurement defined prescriptively in terms of individual-level intravariable invariance

Significant attention to theory of item hierarchy and construct definition relative to instrument design

Empirical calibrations only but model prescribes data quality

Data usually has to be gathered and analyzed to have results

Point of use self-scoring forms might provide immediate measurement results to end user

Some construct theory (limited predictive power)

Some commercial applications are not instrument-dependent (as in CAT item bank implementations)

Standards based in ensuring fair methods and processes

Stage 5

Significantly active approach to measurement

Item hierarchy translated into construct theory

Construct specification equation predicts item difficulties

Theory-predicted (not empirical) calibrations used in applications

Item banks superseded by single-use items created on the fly

Calibrations checked against empirical results but data gathering and analysis not necessary

Point of use self-scoring forms or computer apps provide immediate measurement results to end user

Used routinely in commercial applications

Awareness that standards might be based in metrological traceability to consensus standard uniform metric

Stage 6. Most meaning, utility, efficiency, and value

Most purely active approach to measurement

Item hierarchy translated into construct theory

Construct specification equation predicts item ensemble difficulties

Theory-predicted calibrations enable single-use items created from context

Checked against empirical results for quality assessment but data gathering and analysis not necessary

Point of use self-scoring forms or computer apps provide immediate measurement results to end user

Used routinely in commercial applications

Standards based in metrological traceability to consensus standard uniform metric

 

References

Lewin, K. (1951). Field theory in social science: Selected theoretical papers (D. Cartwright, Ed.). New York: Harper & Row.

Stenner, A. J., Burdick, H., Sanford, E. E., & Burdick, D. S. (2006). How accurate are Lexile text measures? Journal of Applied Measurement, 7(3), 307-22.

Stenner, A. J., & Horabin, I. (1992). Three stages of construct definition. Rasch Measurement Transactions, 6(3), 229 [http://www.rasch.org/rmt/rmt63b.htm].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Open Letter to the Impact Investment Community

May 4, 2010

It is very encouraging to discover your web sites (GIIN, IRIS, and GIIRS) and to see the work you’re doing in advancing the concept of impact investing. The defining issue of our time is figuring out how to harness the profit motive for socially responsible and environmentally sustainable prosperity. The economic, social, and environmental disasters of today might all have been prevented or significantly mitigated had social and environmental impacts been taken into account in all investing.

My contribution is to point out that, though the profit motive must be harnessed as the engine driving responsible and sustainable business practices, the force of that power is dissipated and negated by the lack of efficient human, social, and natural capital markets. If we cannot make these markets function more like financial markets, so that money naturally flows to those places where it produces the greatest returns, we will never succeed in the fundamental reorientation of the economy toward responsible sustainability. The goal has to be one of tying financial profits to growth in realized human potential, community, and environmental quality, but to do that we need measures of these intangible forms of capital that are as scientifically rigorous as they are eminently practical and convenient.

Better measurement is key to reducing the market frictions that inflate the cost of human, social, and natural capital transactions. A truly revolutionary paradigm shift has occurred in measurement theory and practice over the last fifty years and more. New methods make it possible

* to reduce data volume dramatically with no loss of information,
* to custom tailor measures by selectively adapting indicators to the entity rated, without compromising comparability,
* to remove rater leniency or severity effects from the measures,
* to design optimally efficient measurement systems that provide the level of precision needed to support decision making,
* to establish reference standard metrics that remain universally uniform across variations in local impact assessment indicator configurations, and
* to calibrate instruments that measure in metrics intuitively meaningful to stakeholders and end users.

Unfortunately, almost all the admirable energy and resources being poured into business intelligence measures skip over these “new” developments, defaulting to mistaken assumptions about numbers and the nature of measurement. Typical ratings, checklists, and scores provide units of measurement that

* change size depending on which question is asked, which rating category is assigned, and who or what is rated,
* increase data volume with every new question asked,
* push measures up and down in uncontrolled ways depending on who is judging the performance,
* are of unknown precision, and
* cannot be compared across different composite aggregations of ratings.

I have over 25 years experience in the use of advanced measurement and instrument calibration methods, backed up with MA and PhD degrees from the University of Chicago. The methods in which I am trained have been standard practice in educational testing for decades, and in the last 20 years have become the methods of choice in health care outcomes assessment.

I am passionately committed to putting these methods to work in the domain of impact investing, business intelligence, and ecological economics. As is shown in my attached CV, I have dozens of peer-reviewed publications presenting technical and philosophical research in measurement theory and practice.

In the last few years, I have taken my work in the direction of documenting the ways in which measurement can and should reduce information overload and transaction costs; enhance human, social, and natural capital market efficiencies; provide the instruments embodying common currencies for the exchange of value; and inform a new kind of Genuine Progress Indicator or Happiness Index.

For more information, please see the attached 2009 article I published in Measurement on these topics, and the attached White Paper I produced last July in response to call from NIST for critical national need ideas. Various entries in my blog (https://livingcapitalmetrics.wordpress.com) elaborate on measurement technicalities, history, and philosophy, as do my web site at http://www.livingcapitalmetrics.com and my profile at http://www.linkedin.com/in/livingcapitalmetrics.

For instance, the blog post at https://livingcapitalmetrics.wordpress.com/2009/11/22/al-gore-will-is-not-the-problem/ explores the idea with which I introduced myself to you here, that the profit motive embodies our collective will for responsible and sustainable business practices, but we hobble ourselves with self-defeating inattention to the ways in which capital is brought to life in efficient markets. We have the solutions to our problems at hand, though there are no panaceas, and the challenges are huge.

Please feel free to contact me at your convenience. Whether we are ultimately able to work together or not, I enthusiastically wish you all possible success in your endeavors.

Sincerely,

William P. Fisher, Jr., Ph.D.
LivingCapitalMetrics.com
919-599-7245

We are what we measure.
It’s time we measured what we want to be.

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Questions about measurement: If it is so important, why…?

January 28, 2010

If measurement is so important, why is measurement quality so uniformly low?

If we manage what we measure, why is measurement leadership virtually nonexistent?

If we can’t tell if things are getting better, staying the same, or getting worse without good metrics, why is measurement so rarely context-sensitive, focused, integrated, and interactive, as Dean Spitzer recommends it should be?

If quantification is valued for its rigor and convenience, why is no one demanding meaningful mappings of substantive, additive amounts of things measured on number lines?

If everyone is drowning in unmanageable floods of data why isn’t measurement used to reduce data volumes dramatically—and not only with no loss of information but with the addition of otherwise unavailable forms of information?

If learning and improvement are the order of the day, why isn’t anyone interested in the organizational and individual learning trajectories that are defined by hierarchies of calibrated items?

If resilient lean thinking is the way to go, why aren’t more measures constructed to retain their meaning and values across changes in item content?

If flexibility is a core value, why aren’t we adapting instruments to people and organizations, instead of vice versa?

If fair, just, and meaningful measurement is often lacking in judge-assigned performance assessments, why isn’t anyone estimating the consistency, and the leniency or harshness, of ratings—and removing those effects from the measures made?

If efficiency is valued, why does no one at all seem to care about adjusting measurement precision to the needs of the task at hand, so that time and resources are not wasted in gathering too much or too little data?

If it’s common knowledge that we can do more together than we can as individuals, why isn’t anyone providing the high quality and uniform information needed for the networked collective thinking that is able to keep pace with the demand for innovation?

Since the metric system and uniform product standards are widely recognized as essential to science and commerce, why are longstanding capacities for common metrics for human, social, and natural capital not being used?

If efficient markets are such great things, why isn’t anyone at all concerned about lubricating the flow of human, social, and natural capital by investing in the highest quality measurement obtainable?

If everyone loves a good profit, why aren’t we setting up human, social, and natural capital metric systems to inform competitive pricing of intangible assets, products, and services?

If companies are supposed to be organic entities that mature in a manner akin to human development over the lifespan, why is so little being done to conceive, gestate, midwife, and nurture living capital?

In short, if measurement is really as essential to management as it is so often said to be, why doesn’t anyone seek out the state of the art technology, methods, and experts before going to the trouble of developing and implementing metrics?

I suspect the answers to these questions are all the same. These disconnects between word and deed happen because so few people are aware of the technical advances made in measurement theory and practice over the last several decades.

For the deep background, see previous entries in this blog, various web sites (www.rasch.org, www.rummlab.com, www.winsteps.com, http://bearcenter.berkeley.edu/, etc.), and an extensive body of published work (Rasch, 1960; Wright, 1977, 1997a, 1997b, 1999a, 1999b; Andrich, 1988, 2004, 2005; Bond & Fox, 2007; Fisher, 2009, 2010; Smith & Smith, 2004; Wilson, 2005; Wright & Stone, 1999, 2004).

There is a wealth of published applied research in education, psychology, and health care (Bezruczko, 2005; Fisher & Wright, 1994; Masters, 2007; Masters & Keeves, 1999). To find more search Rasch and the substantive area of interest.

For applications in business contexts, there is a more limited number of published resources (ATP, 2001; Drehmer, Belohlav, & Coye, 2000; Drehmer & Deklava, 2001; Ludlow & Lunz, 1998; Lunz & Linacre, 1998; Mohamed, et al., 2008; Salzberger, 2000; Salzberger & Sinkovics, 2006; Zakaria, et al., 2008). I have, however, just become aware of the November, 2009, publication of what could be a landmark business measurement text (Salzberger, 2009). Hopefully, this book will be just one of many to come, and the questions I’ve raised will no longer need to be asked.

References

Andrich, D. (1988). Rasch models for measurement. (Vols. series no. 07-068). Sage University Paper Series on Quantitative Applications in the Social Sciences). Beverly Hills, California: Sage Publications.

Andrich, D. (2004, January). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), I-7–I-16.

Andrich, D. (2005). Georg Rasch: Mathematician and statistician. In K. Kempf-Leonard (Ed.), Encyclopedia of Social Measurement (Vol. 3, pp. 299-306). Amsterdam: Academic Press, Inc.

Association of Test Publishers. (2001, Fall). Benjamin D. Wright, Ph.D. honored with the Career Achievement Award in Computer-Based Testing. Test Publisher, 8(2). Retrieved 20 May 2009, from http://www.testpublishers.org/newsletter7.htm#Wright.

Bezruczko, N. (Ed.). (2005). Rasch measurement in health sciences. Maple Grove, MN: JAM Press.

Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences, 2d edition. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Dawson, T. L., & Gabrielian, S. (2003, June). Developing conceptions of authority and contract across the life-span: Two perspectives. Developmental Review, 23(2), 162-218.

Drehmer, D. E., Belohlav, J. A., & Coye, R. W. (2000, Dec). A exploration of employee participation using a scaling approach. Group & Organization Management, 25(4), 397-418.

Drehmer, D. E., & Deklava, S. M. (2001, April). A note on the evolution of software engineering practices. Journal of Systems and Software, 57(1), 1-7.

Fisher, W. P., Jr. (2009, November). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement (Elsevier), 42(9), 1278-1287.

Fisher, W. P., Jr. (2010). Bringing human, social, and natural capital to life: Practical consequences and opportunities. Journal of Applied Measurement, 11, in press [Pre-press version available at http://www.livingcapitalmetrics.com/images/BringingHSN_FisherARMII.pdf].

Ludlow, L. H., & Lunz, M. E. (1998). The Job Responsibilities Scale: Invariance in a longitudinal prospective study. Journal of Outcome Measurement, 2(4), 326-37.

Lunz, M. E., & Linacre, J. M. (1998). Measurement designs using multifacet Rasch modeling. In G. A. Marcoulides (Ed.), Modern methods for business research. Methodology for business and management (pp. 47-77). Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Masters, G. N. (2007). Special issue: Programme for International Student Assessment (PISA). Journal of Applied Measurement, 8(3), 235-335.

Masters, G. N., & Keeves, J. P. (Eds.). (1999). Advances in measurement in educational research and assessment. New York: Pergamon.

Mohamed, A., Aziz, A., Zakaria, S., & Masodi, M. S. (2008). Appraisal of course learning outcomes using Rasch measurement: A case study in information technology education. In L. Kazovsky, P. Borne, N. Mastorakis, A. Kuri-Morales & I. Sakellaris (Eds.), Proceedings of the 7th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems (Electrical And Computer Engineering Series) (pp. 222-238). Cambridge, UK: WSEAS.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Salzberger, T. (2000). An extended Rasch analysis of the CETSCALE – implications for scale development and data construction., Department of Marketing, University of Economics and Business Administration, Vienna (WU-Wien) (http://www2.wu-wien.ac.at/marketing/user/salzberger/research/wp_dataconstruction.pdf).

Salzberger, T. (2009). Measurement in marketing research: An alternative framework. Northampton, MA: Edward Elgar.

Salzberger, T., & Sinkovics, R. R. (2006). Reconsidering the problem of data equivalence in international marketing research: Contrasting approaches based on CFA and the Rasch model for measurement. International Marketing Review, 23(4), 390-417.

Smith, E. V., Jr., & Smith, R. M. (2004). Introduction to Rasch measurement. Maple Grove, MN: JAM Press.35.

Spitzer, D. (2007). Transforming performance measurement: Rethinking the way we measure and drive organizational success. New York: AMACOM.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14(2), 97-116 [http://www.rasch.org/memo42.htm].

Wright, B. D. (1997a, June). Fundamental measurement for outcome evaluation. Physical Medicine & Rehabilitation State of the Art Reviews, 11(2), 261-88.

Wright, B. D. (1997b, Winter). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45, 52 [http://www.rasch.org/memo62.htm].

Wright, B. D. (1999a). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [http://www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Wright, B. D. (1999b). Rasch measurement models. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 85-97). New York: Pergamon.

Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide Range, Inc. [http://www.rasch.org/memos.htm#measess].

Wright, B. D., & Stone, M. H. (2004). Making measures. Chicago: Phaneron Press.

Zakaria, S., Aziz, A. A., Mohamed, A., Arshad, N. H., Ghulman, H. A., & Masodi, M. S. (2008, November 11-13). Assessment of information managers’ competency using Rasch measurement. iccit: Third International Conference on Convergence and Hybrid Information Technology, 1, 190-196 [http://www.computer.org/portal/web/csdl/doi/10.1109/ICCIT.2008.387].

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Mass Customization: Tailoring Tests, Surveys, and Assessments to Individuals without Sacrificing Comparability

January 11, 2010

One of the recurring themes in this blog concerns the technical capacities for more precise and meaningful measurement that remain unrecognized and under-utilized in business, finance, and economics. One of the especially productive capacities I have in mind relates to the techniques of adaptive measurement. These techniques make it possible to tailor measuring tools to the needs of the people measured, which is the diametric opposite of standard practice, which typically assumes it is necessary for people to adapt to the needs of the measuring instrument.

Think about what it means to try to measure every case using the same statements. When you define the limits of your instrument in terms of common content, you are looking for a one-size-fits-all solution. This design requires that you restrict the content of the statements to those that will be relevant in every case. The reason for proceeding in this way hinges on the assumption that you need to administer all of the items to every case in order to make the measures comparable, but this is not true. To conceive measurement in this way is to be shackled to an obsolete technology. Instead of operating within the constraints of an overly-limiting set of assumptions, you could be designing a system that takes missing data into account and that supports adaptive item administration, so that the instrument is tailored to the needs of the measurement situation. The benefits from taking this approach are extensive.

Think of the statements comprising the instrument as defining a hierarchy or continuum that extends from the most important, most agreeable, or easiest-to-achieve things at the bottom, and the least important, least agreeable, and hardest to achieve at the top. Imagine that your data are consistent, so that the probability of importance, agreeability, or success steadily decreases for any individual case as you read up the scale.

Obtaining data consistency like this is not always easy, but it is essential to measurement and to calibrating a scientific instrument. Even when data do not provide the needed consistency, much can be learned from them as to what needs to be done to get it.

Now hold that thought: you have a matrix of complete data, with responses to every item for every case. Now, following the typically assumed design scenario, in which all items are applied to every case, no matter how low a measure is, you think you need to administer the items calibrated at the top of the scale, even if we know from long experience and repeated recalibrations across multiple samples that the response probabilities of importance, agreement, or success are virtually 0.00 for these items.

Conversely, no matter how high a measure is, the usual design demands that all items be administered, even if we know from experience that the response probabilities for the items at the bottom of the scale are virtually 1.00.

In this scenario, we are wasting time and resources obtaining data on items for which we already know the answers. We are furthermore not asking other questions that would be particularly relevant to different individual cases because to include them in a complete data design where one size fits all would make the instrument too long. So we are stuck with a situation in which perhaps only a tenth of the overall instrument is actually being used for cases with measures toward the extremes.

One of the consequences of this is that we have much less information about the very low and very high measures, and so we have much less confidence about where the measures are than we do for more centrally located measures.

If measurement projects were oriented toward the development of an item bank, however, these problems can be overcome. You might develop and calibrate dozens, hundreds, or thousands of items. The bank might be administered in such a way that the same sets of items are applied to different cases only rarely. To the extent that the basic research on the bank shows that the items all measure the same thing, so that different item subsets all give the same result in terms of resolving the location of the measure on the quantitative continuum, comparability is not compromised.

The big plus is that all cases can now be measured with the same degree of meaningfulness, precision and confidence. We can administer the same number of items to all cases, and we can administer the same number of items as you would in your one-size-fits-all design, but now the items are targeted at each individual, providing maximum information. But the quantitative properties are only half the story. Real measurement integrates qualitative meaningfulness with quantitative precision.

As illustrated in the description of the typically assumed one-size-fits-all scenario, we interpret the measures in terms of the item calibrations. In the one-size-fits-all design, very low and very high measures can be associated with consistent variation on only a few items, as there is no variation on most of the items, since they are too easy or hard for this case. And it might happen that even cases in the middle of the scale are found to have response probabilities of 1.00 and 0.00 for the items at the very bottom and top of the scale, respectfully, further impairing the efficiency of the measurement process.

In the adaptive scenario, though, items are selected from the item bank via an algorithm that uses the expected response probabilities to target the respondent. Success on an easy item causes the algorithm to pick a harder item, and vice versa. In this way, the instrument is tailored for the individual case. This kind of mass customization can also be qualitatively based. Items that are irrelevant to the particular characteristics of an individual case can be excluded from consideration.

And adaptive designs do not necessarily have to be computerized, since respondents, examinees, and judges can be instructed to complete a given number of contiguous items in a sequence ordered by calibration values. This effects a kind of self-targeting that effectively reduces the number of overall items administered without the need for expensive investments in programming or hardware.

The literature on adaptive instrument administration is over 40 years old, and is quite technical and extensive. I’ve provided a sample of articles below, including some providing programming guidelines.

The concepts of item banking and adaptive administration of course are the technical mechanisms on which will be built metrological networks of instruments linked to reference standards. See previously posted blog entries here for more on metrology and traceability.

References

Association of Test Publishers. (2001, Fall). Benjamin D. Wright, Ph.D. honored with the Career Achievement Award in Computer-Based Testing. Test Publisher, 8(2). Retrieved 20 May 2009, from http://www.testpublishers.org/newsletter7.htm#Wright.

Bergstrom, B. A., Lunz, M. E., & Gershon, R. C. (1992). Altering the level of difficulty in computer adaptive testing. Applied Measurement in Education, 5(2), 137-149.

Choppin, B. (1968). An item bank using sample-free calibration. Nature, 219, 870-872.

Choppin, B. (1976). Recent developments in item banking. In D. N. M. DeGruitjer & L. J. van der Kamp (Eds.), Advances in Psychological and Educational Measurement (pp. 233-245). New York: Wiley.

Cook, K., O’Malley, K. J., & Roddey, T. S. (2005, October). Dynamic Assessment of Health Outcomes: Time to Let the CAT Out of the Bag? Health Services Research, 40(Suppl 1), 1694-1711.

Dijkers, M. P. (2003). A computer adaptive testing simulation applied to the FIM instrument motor component. Archives of Physical Medicine & Rehabilitation, 84(3), 384-93.

Halkitis, P. N. (1993). Computer adaptive testing algorithm. Rasch Measurement Transactions, 6(4), 254-255.

Linacre, J. M. (1999). Individualized testing in the classroom. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 186-94). New York: Pergamon.

Linacre, J. M. (2000). Computer-adaptive testing: A methodology whose time has come. In S. Chae, U. Kang, E. Jeon & J. M. Linacre (Eds.), Development of Computerized Middle School Achievement Tests [in Korean] (MESA Research Memorandum No. 69). Seoul, South Korea: Komesa Press. Available in English at http://www.rasch.org/memo69.htm.

Linacre, J. M. (2006). Computer adaptive tests (CAT), standard errors, and stopping rules. Rasch Measurement Transactions, 20(2), 1062 [http://www.rasch.org/rmt/rmt202f.htm].

Lunz, M. E., & Bergstrom, B. A. (1991). Comparability of decision for computer adaptive and written examinations. Journal of Allied Health, 20(1), 15-23.

Lunz, M. E., & Bergstrom, B. A. (1994). An empirical study of computerized adaptive test administration conditions. Journal of Educational Measurement, 31(3), 251-263.

Lunz, M. E., & Bergstrom, B. A. (1995). Computerized adaptive testing: Tracking candidate response patterns. Journal of Educational Computing Research, 13(2), 151-162.

Lunz, M. E., Bergstrom, B. A., & Gershon, R. C. (1994). Computer adaptive testing. In W. P. Fisher, Jr. & B. D. Wright (Eds.), Special Issue: International Journal of Educational Research, 21(6), 623-634.

Lunz, M. E., Bergstrom, B. A., & Wright, B. D. (1992, Mar). The effect of review on student ability and test efficiency for computerized adaptive tests. Applied Psychological Measurement, 16(1), 33-40.

McHorney, C. A. (1997, Oct 15). Generic health measurement: Past accomplishments and a measurement paradigm for the 21st century. [Review] [102 refs]. Annals of Internal Medicine, 127(8 Pt 2), 743-50.

Meijer, R. R., & Nering, M. L. (1999, Sep). Computerized adaptive testing: Overview and introduction. Applied Psychological Measurement, 23(3), 187-194.

Raîche, G., & Blais, J.-G. (2009). Considerations about expected a posteriori estimation in adaptive testing. Journal of Applied Measurement, 10(2), 138-156.

Raîche, G., Blais, J.-G., & Riopel, M. (2006, Autumn). A SAS solution to simulate a Rasch computerized adaptive test. Rasch Measurement Transactions, 20(2), 1061.

Reckase, M. D. (1989). Adaptive testing: The evolution of a good idea. Educational Measurement: Issues and Practice, 8, 3.

Revicki, D. A., & Cella, D. F. (1997, Aug). Health status assessment for the twenty-first century: Item response theory item banking and computer adaptive testing. Quality of Life Research, 6(6), 595-600.

Riley, B. B., Conrad, K., Bezruczko, N., & Dennis, M. L. (2007). Relative precision, efficiency, and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN substance problem scale. Journal of Applied Measurement, 8(1), 48-64.

van der Linden, W. J. (1999). Computerized educational testing. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 138-50). New York: Pergamon.

Velozo, C. A., Wang, Y., Lehman, L., & Wang, J.-H. (2008). Utilizing Rasch measurement models to develop a computer adaptive self-report of walking, climbing, and running. Disability & Rehabilitation, 30(6), 458-67.

Vispoel, W. P., Rocklin, T. R., & Wang, T. (1994). Individual differences and test administration procedures: A comparison of fixed-item, computerized-adaptive, self-adapted testing. Applied Measurement in Education, 7(1), 53-79.

Wang, T., Hanson, B. A., & Lau, C. M. A. (1999, Sep). Reducing bias in CAT trait estimation: A comparison of approaches. Applied Psychological Measurement, 23(3), 263-278.

Ware, J. E., Bjorner, J., & Kosinski, M. (2000). Practical implications of item response theory and computerized adaptive testing: A brief summary of ongoing studies of widely used headache impact scales. Medical Care, 38(9 Suppl), II73-82.

Weiss, D. J. (1983). New horizons in testing: Latent trait test theory and computerized adaptive testing. New York: Academic Press.

Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375.

Weiss, D. J., & Schleisman, J. L. (1999). Adaptive testing. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 129-37). New York: Pergamon.

Wouters, H., Zwinderman, A. H., van Gool, W. A., Schmand, B., & Lindeboom, R. (2009). Adaptive cognitive testing in dementia. International Journal of Methods in Psychiatric Research, 18(2), 118-127.

Wright, B. D., & Bell, S. R. (1984, Winter). Item banks: What, why, how. Journal of Educational Measurement, 21(4), 331-345 [http://www.rasch.org/memo43.htm].

Wright, B. D., & Douglas, G. A. (1975). Best test design and self-tailored testing (Tech. Rep. No. 19). Chicago, Illinois: MESA Laboratory, Department of Education, University of Chicago [http://www.rasch.org/memo19.pdf] (Research Memorandum No. 19).

Creative Commons License
LivingCapitalMetrics Blog by William P. Fisher, Jr., Ph.D. is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Based on a work at livingcapitalmetrics.wordpress.com.
Permissions beyond the scope of this license may be available at http://www.livingcapitalmetrics.com.

Wright, B. D., & Douglas, G. A. (1975). Best test design and self-tailored testing (Tech. Rep. No. 19). Chicago, Illinois: MESA Laboratory, Department of Education,  University of Chicago [http://www.rasch.org/memo19.pdf] (Research Memorandum No. 19).