An investigation of enhancement of ability evaluation by using a nested logit model for multiple-choice items

  • Tour Liu Collaborative Innovation Center of Assessment toward Basic Education Quality (CICA-BEQ), Bejing Normal University
  • Mengcheng Wang Center for Psychometric and Latent Variable Modeling, Guangzhou University
  • Tao Xin School of Psychology, Beijing Normal University
Keywords: multiple-choice item, nested logitmodel, distractor information, ability evaluation


Multiple-choice item is wildly used in psychological and educational test. The present study investigated that if a multiple-choice item have an advantage than a dichotomous item on ability evaluation.An item response model,nested logitmodel (NLM),was used to fit the multiple-choice data. Both simulation study and empirical study indicated that the accuracy and the stability of ability estimation were enhanced by using multiple-choice model rather than dichotomous model, because more information was included in multiple-choice items’ distractors. But the accuracy of ability parameter estimation showed little differences in 4-choice items, 5-choice items and 6-choice items. Moreover, NLM could extract more information from low-level respondents than from high-level ones, because they hadmore distractor chosen behaviors. Furthermore, respondents at different trait levels would be attracted by different distractors in an empirical study of a Chinese Vocabulary Test for Grade 1 by using the changing traces of distractor probabilities calculated from NLM. It is suggested that the responses of students at different levelsmight reflect the students’ vocabulary development process.


Download data is not yet available.


Attali, Y., & Fraenkel, T. (2000). The Point-Biserial as a Discrimination Index for Distractors in Multiple-Choice Items:Deficiencies in Usage and an alternative. Journal of Educational Measurement, 37(1), 77-86. doi: 10.1111/j.1745-3984.2000.tb01077.x

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51.doi: 10.1007/BF02291411

Bolt, D. M., Wollack, J. A., &Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77, 339-357.doi: 10.1007/S11336-012-9257-5

Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33-63.doi: 10.1207/s15326977ea1101_2

Cao, Y. W. (1999). Construction of vocabulary tests for junior school level. Acta Psychologica Sinica, 31, 460-467.

Davis, F. B., & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19, 159-170.doi: 10.1177/001316445901900202

Divgi, D. R. (1986). Does the Rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23, 283-298.doi: 10.1111/j.1745-3984.1986.tb00251.x

Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness Measurement with Polychotomous Item Response Models and Standardized Indices. British Journal of Mathematical and Statistical Psychology, 38, 67-86. doi: 10.1111/j.2044-8317.1985.tb00817.x

Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Green, B. F., Crone, C. R., & Folk, V. G. (1989). A Method for Studying Differential Distractor Functioning. Journal of Educational Measurement, 26, 147-160.doi: 10.1111/j.1745-3984.1989.tb00325.x

Haladyna, T. M., & Downing, S. M. (1989). A Taxonomy of Multiple-Choice Item-Writing Rules. Applied Measurement in Education, 2, 37-50.doi: 10.1207/s15324818ame0201_3

Haladyna, T. M., & Downing, S. M. (1993). How Many Options is Enough for a Multiple-Choice Testing Item. Educational and Psychological Measurement, 53, 999-1010.doi: 10.1177/0013164493053004013

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education, 15, 309-333.doi: 10.1207/S15324818AME1503_5

Henning, G. (1989). Does the Rasch model really work for multiple-choice items? Take another look: a response to Divgi. Journal of Educational Measurement, 26, 91-97.doi: 10.1111/j.1745-3984.1989.tb00321.x

Hofmann, K. P. (2007). Psychology of Decision Making in Economics, Business and Finance. Nova Publishers.

Jacobs, P. I., &Vandeventer, M. (1970). Information in wrong responses. Psychological Reports, 26, 311-315.doi: 10.2466/pr0.1970.26.1.311

Kim, J. (2006). Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking. Journal of Educational Measurement, 43, 193-213.doi: 10.1111/j.1745-3984.2006.00013.x

Levine, M. V., &Drasgow, F. (1983). The relation between incorrect option choice and estimated ability. Educational and Psychological Measurement, 43, 675-685.doi: 10.1177/001316448304300301

Liu, O. L., Lee, H., & Linn, M. C. (2011). An investigation of explanation multiple-choice items in science assessment. Educational Assessment, 16, 164-184.doi: 10.1080/10627197.2011.611702

Love, T. E. (1997). Distractor selection ratios. Psychometrika, 62, 51-62.doi: 10.1007/BF02294780

Luecht, R. M. (2007). Using information from multiple-choice distractors to enhance cognitive-diagnostic score reporting. In J. P. Leighton &M. J. Gierl (Eds.),Cognitive diagnostic assessment for education: Theory and practices(pp. 319–340). Cambridge University Press.doi: 10.1017/CBO9780511611186

Muraki, E. (1992). A Generalized Partial Credit Model:Application of an EM Algorithm. Applied Psychological Measurement, 16, 159-176.doi: 10.1002/j.2333-8504.1992.tb01436.x

Penfield, R. D. (2011). How are the Form and Magnitude of DIF Effects in Multiple-Choice Items Determined by Distractor-Level Invariance Effects?. Educational And Psychological Measurement, 71, 54-67.doi: 10.1177/0013164410387340

Roediger III, H. L., & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155.doi: 10.1037/0278-7393.31.5.1155

Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in science Teaching, 35, 265-296.doi: 10.1002/(SICI)1098-2736(199803)35:3<265::AID-TEA3>3.0.CO;2-P

Sigel, I. E. (1963). How intelligence tests limit understanding of intelligence. Merrill-Paker Quarterly,9, 39-56.

Suh, Y., & Bolt, D. M. (2010). Nested logit models for multiple-choice item response data. Psychometrika, 75, 454-473.doi: 10.1007/s11336-010-9163-7

Suh, Y., & Bolt, D. M. (2011). A Nested Logit Approach for Investigating Distractors as Cause of Different Item Functioning. Journal of Educational Measurement, 48, 188-205.doi: 10.1111/j.1745-3984.2011.00139.x

Suh, Y., & Talley, A. E. (2015). An Empirical Comparison of DDF Detection Methods for Understanding the Causes of DIF in Multiple-Choice Items. Applied Measurement in Education, 28, 48-67.doi: 10.1080/08957347.2014.973560

Tamir, P. (1971). An alternative approach to the construction of multiple choice test items. Journal of Biological Education, 5, 305-307.doi: 10.1080/00219266.1971.9653728

Tamir, P. (1989). Some issues related to the use of justifications to multiple-choice answers. Journal of Biological Education, 23, 285-292.doi: 10.1080/00219266.1989.9655083

Thissen, D. M. (1976). Information in wrong responses to the Raven Progressive Matrices. Journal of Educational Measurement, 13, 201-214.doi: 10.1111/j.1745-3984.1976.tb00011.x

Thissen, D., & Steinberg, L. (1984). A Response Model for Multiple Choice Items. Psychometrika, 49, 501-519.doi: 10.1007/BF02302588

Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-Choice Models: The Distractors Are Also Part of the Item. Journal of Educational Measurement, 26, 161-176.doi: 10.1111/j.1745-3984.1989.tb00326.x

Walther B. A., & Moore J. L. (2005). The concepts of bias, precision and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimator performance. Ecography, 28, 815-829. doi: 10.1111/j.2005.0906-7590.04112.x

Wollack, J. A. (1997). A Nominal Response Model Approach for Detecting Answer Copying. Applied Psychological Measurement, 21, 307-320.doi: 10.1177/01466216970214002

How to Cite
Liu, T., Wang, M., & Xin, T. (2017). An investigation of enhancement of ability evaluation by using a nested logit model for multiple-choice items. Anales De Psicología / Annals of Psychology, 33(3), 530-537.