Automating grading approach for open-ended stem answers using LLM

Name: Automating grading approach for open-ended stem answers using LLM
Creator: Thammasat University
Published: 2026-01-23 10:00:03
License: 暂无描述

DataCite Commons2026-01-23 更新2026-05-04 收录

下载链接：

http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/TU.the.2025.62

下载链接

链接失效反馈

官方服务：

资源简介：

AI has become increasingly important in various aspects of life, including education, where it can potentially alleviate the burden on teachers, particularly in grading. Teachers often face heavy workloads, and traditional grading processes can introduce human judgment inconsistencies that affect the fairness of assessments. This research explores the effectiveness of AI-based grading systems in educational contexts, focusing on critical performance metrics such as accuracy, recall, precision, F1 score, RMSE, and adjacent agreement rate. The study demonstrates the strong performance of the Rule-Based Model, which achieved perfect recall (100%) and a 49.93% adjacent agreement rate, illustrating its ability to minimize false negatives. This is crucial for educational assessments, where false negatives can misrepresent a student's true abilities and undermine trust in the grading system. The Rule-Based Model’s accuracy of 74.72% closely aligns with typical human grader performance, showing that AI grading can achieve a level of consistency comparable to that of human evaluators. However, the research also uncovers several areas for improvement, including enhancing grading accuracy, refining partial credit assignment, and developing better methods for evaluating reasoning. Additionally, there is a need to adapt AI models for non-English educational settings, particularly in Thailand, where local linguistic and cultural differences could impact AI model performance. The theoretical implications of this research highlight the importance of developing a more comprehensive framework for evaluating AI grading systems—one that emphasizes fairness, transparency, and cultural sensitivity. On a practical level, the findings suggest that further improvements in AI models are necessary to ensure their accuracy, scalability, and alignment with local educational systems. Future research should focus on refining AI grading systems for open-ended STEM assessments, addressing biases, and ensuring that AI models are adapted to the linguistic and cultural contexts of different regions. Such efforts will helpcreate more reliable, equitable, and scalable AI-driven grading systems for global educational systems.

提供机构：

Thammasat University

创建时间：

2026-01-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集