Chinese EFL Learners' Writing Evaluation by ChatGPT

Mendeley Data2024-03-27 更新2024-06-27 收录

下载链接：

https://data.mendeley.com/datasets/8fbzsg82p9

下载链接

链接失效反馈

官方服务：

资源简介：

The data mainly provide ChatGPT's rating on 82 Chinese EFL learners' writings with scores and comments as well as the scores by reliable manual rating. With the data, researchers can do quantitative or qualitative research on the reliability of EFL writing evaluation with ChatGPT by taking reliable manual ratings as a reference. It includes two parts: 1) ChatGPT's rating with scores and comments, and 2) statistics on overall, average, and specific scores of manual and ChatGPT's rating. 1. EFL Writings with ChatGPT's Rating There are 270 EFL expository compositions in the Spoken and Written Corpus of Chinese Learners Version 2.0. (Wen et al., 2008) written by 270 Chinese EFL learners within a time limit of 30 minutes. Their IDs are from "WEXP0001" to "WEXP0270". Eighty-two compositions are randomly sampled from the 270 compositions. The sample size is determined by the power analysis software G*Power (Faul et al., 2009; Faul et al., 2007). A set of random 82 numbers from 270 are generated by using the Random Numbers Generator. The ChatGPT's rating is generated by asking ChatGPT to rate the 82 EFL writings one by one. The next day, the same 82 writings were rated by ChatGPT again with the same prompts to obtain another set of scores. 2. Scores of Manual and ChatGPT's Rating The spreadsheet provides not only ChatGPT's rating on the EFL compositions with overall and specific scores but also corresponding scores of manual rating. For the manual rating, the compositions were rated by three experienced raters on aspects of language (40%), content (30%), and organization (30%) and the total score was the sum of the three parts. Then the average scores of the total score and scores of each aspect from the three raters were calculated. The inter-rater reliability analysis between scores from every two raters was conducted. The result showed that they have significant (p < 0.01) and high inter-rater reliabilities, which were from 0.710 to 0.785. References Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior research methods, 41(4), 1149-1160. https://doi.org/10.3758/BRM.41.4.1149 Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior research methods, 39(2), 175-191. https://doi.org/10.3758/BF03193146 Wen, Q., Wang, L., & Liang, M. (2008). Spoken and Written English Corpus of Chinese Learners (Version 2.0). Foreign Language Teaching and Research Press.

本数据集提供了ChatGPT对82篇中国英语作为外语学习者（English as a Foreign Language, EFL）习作的评分与评语，同时附带经权威人工评分得到的分数。依托本数据集，研究者可将权威人工评分作为参照，开展针对ChatGPT用于EFL写作评分可靠性的定量或定性研究。本数据集包含两部分：1）ChatGPT对习作的评分与评语；2）人工评分与ChatGPT评分的总体得分、平均得分及分项得分统计数据。 1. 附ChatGPT评分的EFL习作该部分数据源自《中国学习者口语与写作语料库2.0版》（Wen等，2008）中的270篇EFL说明文习作，由270名中国EFL学习者在30分钟限时内完成，习作ID范围为"WEXP0001"至"WEXP0270"。研究者从270篇习作中随机抽取82篇作为样本，样本量由统计功效分析软件G*Power（Faul等，2009；Faul等，2007）确定，通过随机数生成器生成1至270范围内的82个随机编号完成抽样。随后，研究者逐篇请ChatGPT对这82篇习作进行评分；次日，研究者使用完全相同的提示词再次让ChatGPT对同一批习作评分，得到第二组评分结果。 2. 人工评分与ChatGPT评分数据本数据集附带的表格不仅包含ChatGPT对EFL习作的总体得分与分项得分，还提供了对应的人工评分结果。人工评分环节由3名经验丰富的评分员完成，评分维度包括语言（权重40%）、内容（权重30%）与篇章组织（权重30%），总分为三个维度得分之和。随后，研究者计算3名评分员给出的总得分及各维度得分的平均值。研究者对每两名评分员的评分结果进行了评分者间信度分析，结果显示评分者间信度显著（p<0.01）且较高，取值范围为0.710至0.785。参考文献 Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149-1160. https://doi.org/10.3758/BRM.41.4.1149 Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191. https://doi.org/10.3758/BF03193146 Wen, Q., Wang, L., & Liang, M. (2008). Spoken and Written English Corpus of Chinese Learners (Version 2.0). Foreign Language Teaching and Research Press.

创建时间：

2024-01-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集