A Dataset of Digitized Student Examination Papers, Answer Keys, and Manual Evaluations for Automated Grading Research

Name: A Dataset of Digitized Student Examination Papers, Answer Keys, and Manual Evaluations for Automated Grading Research
Creator: Mendeley Data
Published: 2026-04-14 13:08:55
License: 暂无描述

DataCite Commons2026-04-14 更新2026-05-04 收录

下载链接：

https://data.mendeley.com/datasets/sf3kvjwknt

下载链接

链接失效反馈

官方服务：

资源简介：

The automation of academic grading is a critical challenge in Educational Data Mining (EDM), Natural Language Processing (NLP), and Computer Vision. This dataset provides a comprehensive, end-to-end collection of university-level examination records for 50 students in a Data Science course. It serves as a ground-truth benchmark for researchers developing Optical Character Recognition (OCR) systems, Automated Essay Scoring (AES) models, and automated student evaluation pipelines. Examination Structure: The exam evaluates students on Data Science concepts and comprises two sections for a maximum of 50 marks: Part I: 20 Multiple Choice Questions (1 mark each). Part II: 15 Short Answer Questions (2 marks each). Dataset Contents: The dataset provides paired, transparent data at every stage of the examination and grading process: Source Material: The original examination questionnaire (Question.txt) and the authoritative grading rubric/answer key (answerkey.txt). Raw Data: 50 digitized, uncorrected student answer sheets (/Student_Pdf/), serving as raw inputs for OCR and handwriting-recognition models. Corrected Data: 50 manually evaluated answer sheets (/Corrected_Pdf/) featuring teacher annotations, visual corrections, and tally marks. Tabular Records: A comprehensive CSV file (Teacher_manual_marks_Anonymized.csv) detailing the precise item-level manual evaluation scores for every question (Q1 through Q35) for all 50 students, allowing for granular ML model evaluation. Ethical Compliance & Anonymization: To comply with standard ethical guidelines for open educational datasets, all Personally Identifiable Information (PII) has been strictly anonymized. Real student names and institutional roll numbers were computationally replaced with sequential identifiers (e.g., Student_1) across all files. Furthermore, all physical instances of handwritten names and IDs within the scanned PDF pages were visually redacted and flattened to guarantee complete subject anonymity.

提供机构：

Mendeley Data

创建时间：

2026-04-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集