Dataset for Automated Marking System for Essay Questions

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/9dxzsjv2jd

下载链接

链接失效反馈

官方服务：

资源简介：

Data Description This data uses semantic analysis offers a nuanced understanding of student answers for essay questions for fairer evaluations over traditional methods, thereby refining scoring systems and mitigating human bias. Dataset Overview The University of Uyo CSC 221 (Introduction to File Processing) dataset (2019/2020 session) includes 100 student scripts and instructor-designed marking schemes. Initially Word documents, this data was manually transcribed to Excel for structured analysis, with each entry comprising a student's response, expected answer, and awarded marks. Data Attributes and Structure Key dataset attributes: Question Number (qNo) Sub-question (SubQ) Mark Awarded Student Registration Number (StudentRegNo) Answers Marking Scheme Actual Mark Data Collection Process Data from 2019/2020 CSC 221 assessments was manually transcribed from Word documents (scripts and marking schemes) into Excel. This process aligned answers with schemes, structuring all relevant fields (qNo, SubQ, Mark Awarded, Actual Mark) for machine learning applications like semantic analysis and automated scoring. Notable Findings and Observations Analysis revealed key observations: Response/Marking Discrepancy: Student answer variability from marking schemes highlighted the need for semantic analysis for fair scoring beyond keyword matching. Response Length Impact: Shorter answers sometimes received scores comparable to longer ones, questioning if scoring systems adequately reflect quality. Scoring Model Comparison: Automated scoring (semantic similarity) showed high consistency with human scoring. Minor discrepancies, particularly with partial credit, indicated human judgment remains crucial. Automated Scoring Efficiency: The automated system efficiently processed large data volumes, suggesting scalability for educational settings. Data Interpretation and Usage The dataset's value lies in developing and improving automated scoring systems and assessment. It can train machine learning models to predict grades, analyze student performance, and detect scoring biases. Researchers can leverage this dataset to: Evaluate Scoring Models: Test various scoring algorithms. Improve Scoring Systems: Refine methodologies via semantic analysis. Inform Curriculum Development: Identify common student struggles. Enhance Teaching Strategies: Uncover insights into student learning behaviors. This dataset advances automated scoring, offering a robust foundation for machine learning models to automate scoring, predict performance, and refine curricula. It enables exploration of new, semantically-aware scoring approaches, significantly contributing to efficient, fair, and scalable educational assessment.

创建时间：

2025-06-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集