How L2 Teachers Identify and Respond to GenAI-Generated Essays: A Sensemaking Theory Investigation

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://data.mendeley.com/datasets/x7m73syvc8

下载链接

链接失效反馈

官方服务：

资源简介：

This table contains the quantitative questionnaire data from the survey-experiment component of our mixed-methods manuscript on CSL/L2 teachers’ identification of GenAI-generated essays. In the questionnaire phase, 238 in-service teachers with experience in Chinese writing instruction completed an online task in which they judged the source of 12 essays one-by-one (four student-written and eight GenAI-generated). The student essays were sampled from the HSK Dynamic Composition Corpus (Advanced level) and selected to cover score bands 2–5 on the same writing topic; the GenAI essays were generated by two mainstream models (ChatGPT-5.1 and DeepSeekV3.2) using a standardized prompt template. Teachers provided a binary label (GenAI-generated vs. student-written) and briefly explained the basis for each decision. Hypothesis: because authorship identification is an equivocal, high-uncertainty task (especially in CSL writing), teachers’ classifications will be cue-driven and only modestly accurate overall, and accuracy will be shaped more by teachers’ GenAI use proficiency and by text/model characteristics than by teaching seniority. The results align with this hypothesis: overall identification accuracy is close to chance (47.72%); teaching experience does not significantly improve performance; but teachers with high GenAI proficiency achieve significantly higher accuracy than other groups, and accuracy differs across score bands. At the model level, teachers identify DeepSeek-generated essays more accurately than GPT-generated essays, suggesting different models can vary in “indistinguishability.” Notable findings and interpretation: teachers most frequently justify decisions with learner-typical error cues and with “GenAI-like” quality/style cues (e.g., overly neat structure, overly clear logic, overly fluent style), plus references to personal experience details as authenticity signals. Analytically, the table can be used to compute accuracy and error rates (overall, by model, and by band), test predictors using teacher background variables, and code rationales to quantify cue usage and relate cue profiles to correct/incorrect judgments—supporting transparent reuse and replication.

创建时间：

2026-02-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集