Multiple-Attempt Procedures: Models, Computerized Adaptive Testing, and Differential Item Functioning

Name: Multiple-Attempt Procedures: Models, Computerized Adaptive Testing, and Differential Item Functioning
Creator: University of Notre Dame
Published: 2025-08-01 13:47:59
License: 暂无描述

DataCite Commons2025-08-01 更新2026-05-07 收录

下载链接：

https://curate.nd.edu/articles/dataset/Multiple-Attempt_Procedures_Models_Computerized_Adaptive_Testing_and_Differential_Item_Functioning/29566115

下载链接

链接失效反馈

官方服务：

资源简介：

Multiple-attempt items are an innovative item type that remains under-studied in psychometrics and educational measurement. This dissertation advances the field by (a) extending sequential item-response theory for multiple-choice, multiple-attempt items (SIRT-MM), (b) designing computerized adaptive testing that incorporates multiple-attempt items, and (c) clarifying and detecting differential item functioning for such items. Chapter 2 introduces two extensions of the SIRT-MM model. The first permits the slope of each item-category response function to vary, while the second freely estimates a pseudo guessing parameter to capture different success rates due to guessing. These models allow a wider range of response-function shapes and are more likely to fit empirical data. Model-selection strategies and parameter estimation methods for the new formulations are also proposed and evaluated. Chapter 3 explores the integration of multiple-choice, multiple-attempt test items within the Computerized Adaptive Testing (CAT) framework, named as MM-CAT. Using the sequential item response theory model for multiple-choice, multiple-attempt items (Lu, Fowler, & Cheng, 2025), a simulation study was conducted to investigate the effectiveness of a MM-CAT design in improving ability estimation accuracy compared to traditional CAT, which relies on single-attempt, dichotomously scored items. Results show that MM-CAT substantially reduces the standard error of measurement (SEM), bias and root mean square error (RMSE), particularly for examinees with lower ability levels. Furthermore, we examine the impact of item exposure control procedures and find that while both the Sympson-and-Hetter method (SH; Shealy & Stout, 1993) and the Randomesque method (Kingsbury & Zara, 1989) are useful, the SH method is particularly effective in exposure control when paired with MM-CAT, minimizing the severeness of over-exposed items without sacrificing the measurement precision. Taken together, these findings suggest that MM-CAT is a promising approach for enhancing the precision and fairness of adaptive testing, especially in educational contexts where multiple attempts may support both assessment and learning. While multiple-attempt procedures and items have been widely studied, limited research has addressed Differential Item Functioning (DIF) in the context of multiple-attempt items. Chapter 4 formalizes the concept of attempt-level DIF, which captures attempt-specific mechanisms underlying DIF. We present example scenarios to illustrate how attempt-level DIF can arise and propose several detection methods capable of identifying it. Simulation results demonstrate that these methods yield higher true positive rates (i.e., greater power) compared to traditional DIF detection approaches. Their advantage is particularly evident when the sample size and variance of item responses are reduced in the specific attempt where DIF exists.

多尝试题项（multiple-attempt items）是心理计量学与教育测量领域中一种尚未得到充分研究的创新题型。本学位论文从以下三个方面推动了该领域的发展：（a）拓展适用于多选多尝试题项的序贯项目反应理论模型（SIRT-MM）；（b）设计融合多尝试题项的计算机化自适应测验；（c）明确并实现此类题项的项目功能差异检测。第二章介绍了SIRT-MM模型的两种拓展形式。第一种拓展允许每个项目类别反应函数的斜率自由变动，第二种则通过自由估计伪猜测参数，以捕捉因猜测行为带来的不同成功率。此类拓展模型能够覆盖更广泛的反应函数形态，更贴合实证数据。本文同时提出了针对新模型的模型选择策略与参数估计方法，并对其进行了评估。第三章探讨了将多选多尝试题项整合至计算机化自适应测验（CAT）框架的方法，该框架被命名为MM-CAT。本研究基于Lu、Fowler与Cheng（2025）提出的多选多尝试题项序贯项目反应理论模型，开展了一项模拟研究，对比了MM-CAT与依赖单尝试二分计分题项的传统计算机化自适应测验在提升能力估计精度方面的效果。结果表明，MM-CAT可显著降低测量标准误（SEM）、估计偏差与均方根误差（RMSE），尤其针对低能力水平的受测者效果更为突出。此外，本研究还考察了项目曝光控制程序的影响，发现辛普森-赫特法（SH；Shealy & Stout, 1993）与类随机法（Kingsbury & Zara, 1989）均具有应用价值，其中辛普森-赫特法在与MM-CAT结合时，在项目曝光控制方面表现尤为出色，可在不牺牲测量精度的前提下，最大限度降低过度曝光题项的严重程度。综合来看，上述研究结果表明，MM-CAT是一种有望提升自适应测验精度与公平性的方案，尤其适用于支持评估与学习双重目标的多尝试教育场景。尽管多尝试程序与题项已得到广泛研究，但针对多尝试题项场景下的项目功能差异（DIF）的相关研究仍较为匮乏。第四章正式界定了尝试层面项目功能差异（attempt-level DIF）的概念，该概念可捕捉DIF背后的尝试特异性机制。本文通过示例场景阐释了尝试层面项目功能差异的产生路径，并提出了若干可用于检测该差异的方法。模拟研究结果显示，相较于传统的项目功能差异检测方法，所提出的方法拥有更高的真阳性率（即更强的检验功效），这一优势在存在DIF的尝试中，当样本量与项目反应方差较小时尤为显著。

提供机构：

University of Notre Dame

创建时间：

2025-07-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集