Tamaulipas Multiple-Choice-Question Exam Image Dataset for Optical Mark Recognition Research
收藏DataCite Commons2026-03-18 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/djmynjwjpy
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was gathered in the context of Optical Mark Recognition (OMR), where the objective is to automatically detect the selected answers in a multiple-choice exam. OMR is the first step in developing automatic grading for Multiple Choice Question (MCQ) exams. This dataset contains 5721 scanned images of four-choice exam answer sheets completed by high school students. Each image is accompanied by a text file containing the human-observed labels for each item. The exams were administered in 2024 to 10th-, 11th-, and 12th-grade students at 42 high schools in Tamaulipas, Mexico. To protect the student's privacy, we developed an anonymization process based on geometric image processing. Of the answer sheets, 3669 from 10th and 12th grades contain 90 items, while 2052 from 11th grade contain 100 items, totaling 535,020 items. The variety of styles used to mark the items, as well as noise and artifacts due to human and digitization errors, makes this dataset valuable in the design of automatic OMR algorithms for real-life applications in automatic MCQ exam grading, based on machine learning or classical image processing.
提供机构:
Mendeley Data
创建时间:
2026-03-18



