five

A Dataset of Digitized Student Examination Papers, Answer Keys, and Manual Evaluations for Automated Grading Research

收藏
DataCite Commons2026-04-14 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/sf3kvjwknt
下载链接
链接失效反馈
官方服务:
资源简介:
The automation of academic grading is a critical challenge in Educational Data Mining (EDM), Natural Language Processing (NLP), and Computer Vision. This dataset provides a comprehensive, end-to-end collection of university-level examination records for 50 students in a Data Science course. It serves as a ground-truth benchmark for researchers developing Optical Character Recognition (OCR) systems, Automated Essay Scoring (AES) models, and automated student evaluation pipelines. Examination Structure: The exam evaluates students on Data Science concepts and comprises two sections for a maximum of 50 marks: Part I: 20 Multiple Choice Questions (1 mark each). Part II: 15 Short Answer Questions (2 marks each). Dataset Contents: The dataset provides paired, transparent data at every stage of the examination and grading process: Source Material: The original examination questionnaire (Question.txt) and the authoritative grading rubric/answer key (answerkey.txt). Raw Data: 50 digitized, uncorrected student answer sheets (/Student_Pdf/), serving as raw inputs for OCR and handwriting-recognition models. Corrected Data: 50 manually evaluated answer sheets (/Corrected_Pdf/) featuring teacher annotations, visual corrections, and tally marks. Tabular Records: A comprehensive CSV file (Teacher_manual_marks_Anonymized.csv) detailing the precise item-level manual evaluation scores for every question (Q1 through Q35) for all 50 students, allowing for granular ML model evaluation. Ethical Compliance & Anonymization: To comply with standard ethical guidelines for open educational datasets, all Personally Identifiable Information (PII) has been strictly anonymized. Real student names and institutional roll numbers were computationally replaced with sequential identifiers (e.g., Student_1) across all files. Furthermore, all physical instances of handwritten names and IDs within the scanned PDF pages were visually redacted and flattened to guarantee complete subject anonymity.
提供机构:
Mendeley Data
创建时间:
2026-04-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作