Descriptive Statistics for Human Raters and AESs.

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/Descriptive_Statistics_for_Human_Raters_and_AESs_/28685501

下载链接

链接失效反馈

官方服务：

资源简介：

Automated evaluation systems (AESs) for spoken language assessment are increasingly adopted in global educational settings, yet their validity in non-Western contexts remains underexplored. This study addresses this gap by examining three widely used Chinese-developed AES tools in their assessment of spoken English proficiency among 30 Chinese undergraduates. The study employed an IELTS-adapted speaking test, assessed simultaneously by AESs and human raters, with scoring alignment analyzed through intra-class correlation coefficients, Pearson correlations, and linear regression. Results revealed that two systems demonstrated strong agreement with human ratings, while the third exhibited systematic score inflation, likely due to algorithmic discrepancies and limited consideration of nuanced language features. Our findings suggest the potential of AESs as valuable complements to traditional language assessment methods, while highlighting the necessity for calibration and validation procedures. This research has significant implications for integrating AESs in educational contexts, particularly in English as a Foreign Language (EFL) settings, where they can enhance efficiency and standardization.

口语测评自动化评分系统（Automated Evaluation Systems, AESs）如今在全球教育场景中的应用愈发广泛，但其在非西方语境下的有效性仍未得到充分探索。本研究针对这一研究空白，选取三款广泛使用的国产自动化评分工具，对30名中国本科生的英语口语水平开展测评研究。本研究采用经改编的雅思口语测试，同时由自动化评分系统与人工评分员进行评分，并通过组内相关系数、皮尔逊相关系数及线性回归分析评分一致性。研究结果显示，其中两款系统的评分与人工评分一致性较强，而第三款系统则存在系统性的分数虚高问题，这一现象大概率源于算法差异以及对细微语言特征的考量不足。本研究结果表明，自动化评分系统有望成为传统语言测评方法的有益补充，同时也凸显了对其进行校准与验证流程的必要性。本研究对于在教育场景中应用自动化评分系统具有重要参考价值，尤其在英语作为外语（English as a Foreign Language, EFL）的教学环境中，该类系统可有效提升测评效率与标准化程度。

创建时间：

2025-03-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集