An investigation of enhancement of ability evaluation by using a nested logit model for multiple-choice items

Tao Xin, Mengcheng Wang, Tour Liu


Multiple-choice item is wildly used in psychological and educational test. The present study investigated that if a multiple-choice item have an advantage than a dichotomous item on ability evaluation.An item response model,nested logitmodel (NLM),was used to fit the multiple-choice data. Both simulation study and empirical study indicated that the accuracy and the stability of ability estimation were enhanced by using multiple-choice model rather than dichotomous model, because more information was included in multiple-choice items’ distractors. But the accuracy of ability parameter estimation showed little differences in 4-choice items, 5-choice items and 6-choice items. Moreover, NLM could extract more information from low-level respondents than from high-level ones, because they hadmore distractor chosen behaviors. Furthermore, respondents at different trait levels would be attracted by different distractors in an empirical study of a Chinese Vocabulary Test for Grade 1 by using the changing traces of distractor probabilities calculated from NLM. It is suggested that the responses of students at different levelsmight reflect the students’ vocabulary development process.


multiple-choice item;nested logitmodel; distractor information;ability evaluation

Full Text:



Attali, Y., & Fraenkel, T. (2000). The Point-Biserial as a Discrimination Index for Distractors in Multiple-Choice Items:Deficiencies in Usage and an alternative. Journal of Educational Measurement, 37(1), 77-86. doi: 10.1111/j.1745-3984.2000.tb01077.x

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51.doi: 10.1007/BF02291411

Bolt, D. M., Wollack, J. A., &Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77, 339-357.doi: 10.1007/S11336-012-9257-5

Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33-63.doi: 10.1207/s15326977ea1101_2

Cao, Y. W. (1999). Construction of vocabulary tests for junior school level. Acta Psychologica Sinica, 31, 460-467.

Davis, F. B., & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19, 159-170.doi: 10.1177/001316445901900202

Divgi, D. R. (1986). Does the Rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23, 283-298.doi: 10.1111/j.1745-3984.1986.tb00251.x

Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness Measurement with Polychotomous Item Response Models and Standardized Indices. British Journal of Mathematical and Statistical Psychology, 38, 67-86. doi: 10.1111/j.2044-8317.1985.tb00817.x

Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Green, B. F., Crone, C. R., & Folk, V. G. (1989). A Method for Studying Differential Distractor Functioning. Journal of Educational Measurement, 26, 147-160.doi: 10.1111/j.1745-3984.1989.tb00325.x

Haladyna, T. M., & Downing, S. M. (1989). A Taxonomy of Multiple-Choice Item-Writing Rules. Applied Measurement in Education, 2, 37-50.doi: 10.1207/s15324818ame0201_3

Haladyna, T. M., & Downing, S. M. (1993). How Many Options is Enough for a Multiple-Choice Testing Item. Educational and Psychological Measurement, 53, 999-1010.doi: 10.1177/0013164493053004013

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education, 15, 309-333.doi: 10.1207/S15324818AME1503_5

Henning, G. (1989). Does the Rasch model really work for multiple-choice items? Take another look: a response to Divgi. Journal of Educational Measurement, 26, 91-97.doi: 10.1111/j.1745-3984.1989.tb00321.x

Hofmann, K. P. (2007). Psychology of Decision Making in Economics, Business and Finance. Nova Publishers.

Jacobs, P. I., &Vandeventer, M. (1970). Information in wrong responses. Psychological Reports, 26, 311-315.doi: 10.2466/pr0.1970.26.1.311

Kim, J. (2006). Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking. Journal of Educational Measurement, 43, 193-213.doi: 10.1111/j.1745-3984.2006.00013.x

Levine, M. V., &Drasgow, F. (1983). The relation between incorrect option choice and estimated ability. Educational and Psychological Measurement, 43, 675-685.doi: 10.1177/001316448304300301

Liu, O. L., Lee, H., & Linn, M. C. (2011). An investigation of explanation multiple-choice items in science assessment. Educational Assessment, 16, 164-184.doi: 10.1080/10627197.2011.611702

Love, T. E. (1997). Distractor selection ratios. Psychometrika, 62, 51-62.doi: 10.1007/BF02294780

Luecht, R. M. (2007). Using information from multiple-choice distractors to enhance cognitive-diagnostic score reporting. In J. P. Leighton &M. J. Gierl (Eds.),Cognitive diagnostic assessment for education: Theory and practices(pp. 319–340). Cambridge University Press.doi: 10.1017/CBO9780511611186

Muraki, E. (1992). A Generalized Partial Credit Model:Application of an EM Algorithm. Applied Psychological Measurement, 16, 159-176.doi: 10.1002/j.2333-8504.1992.tb01436.x

Penfield, R. D. (2011). How are the Form and Magnitude of DIF Effects in Multiple-Choice Items Determined by Distractor-Level Invariance Effects?. Educational And Psychological Measurement, 71, 54-67.doi: 10.1177/0013164410387340

Roediger III, H. L., & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155.doi: 10.1037/0278-7393.31.5.1155

Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in science Teaching, 35, 265-296.doi: 10.1002/(SICI)1098-2736(199803)35:3<265::AID-TEA3>3.0.CO;2-P

Sigel, I. E. (1963). How intelligence tests limit understanding of intelligence. Merrill-Paker Quarterly,9, 39-56.

Suh, Y., & Bolt, D. M. (2010). Nested logit models for multiple-choice item response data. Psychometrika, 75, 454-473.doi: 10.1007/s11336-010-9163-7

Suh, Y., & Bolt, D. M. (2011). A Nested Logit Approach for Investigating Distractors as Cause of Different Item Functioning. Journal of Educational Measurement, 48, 188-205.doi: 10.1111/j.1745-3984.2011.00139.x

Suh, Y., & Talley, A. E. (2015). An Empirical Comparison of DDF Detection Methods for Understanding the Causes of DIF in Multiple-Choice Items. Applied Measurement in Education, 28, 48-67.doi: 10.1080/08957347.2014.973560

Tamir, P. (1971). An alternative approach to the construction of multiple choice test items. Journal of Biological Education, 5, 305-307.doi: 10.1080/00219266.1971.9653728

Tamir, P. (1989). Some issues related to the use of justifications to multiple-choice answers. Journal of Biological Education, 23, 285-292.doi: 10.1080/00219266.1989.9655083

Thissen, D. M. (1976). Information in wrong responses to the Raven Progressive Matrices. Journal of Educational Measurement, 13, 201-214.doi: 10.1111/j.1745-3984.1976.tb00011.x

Thissen, D., & Steinberg, L. (1984). A Response Model for Multiple Choice Items. Psychometrika, 49, 501-519.doi: 10.1007/BF02302588

Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-Choice Models: The Distractors Are Also Part of the Item. Journal of Educational Measurement, 26, 161-176.doi: 10.1111/j.1745-3984.1989.tb00326.x

Walther B. A., & Moore J. L. (2005). The concepts of bias, precision and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimator performance. Ecography, 28, 815-829. doi: 10.1111/j.2005.0906-7590.04112.x

Wollack, J. A. (1997). A Nominal Response Model Approach for Detecting Answer Copying. Applied Psychological Measurement, 21, 307-320.doi: 10.1177/01466216970214002



  • There are currently no refbacks.

Copyright (c) 2017 Servicio de Publicaciones, Universidad de Murcia (Spain)

Open AccessSello de Calidad FECyT 2013Thomson-Reuters-JCRScielo-Españadoajscimago