five

How L2 Teachers Identify and Respond to GenAI-Generated Essays: A Sensemaking Theory Investigation

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/x7m73syvc8
下载链接
链接失效反馈
官方服务:
资源简介:
This table contains the quantitative questionnaire data from the survey-experiment component of our mixed-methods manuscript on CSL/L2 teachers’ identification of GenAI-generated essays. In the questionnaire phase, 238 in-service teachers with experience in Chinese writing instruction completed an online task in which they judged the source of 12 essays one-by-one (four student-written and eight GenAI-generated). The student essays were sampled from the HSK Dynamic Composition Corpus (Advanced level) and selected to cover score bands 2–5 on the same writing topic; the GenAI essays were generated by two mainstream models (ChatGPT-5.1 and DeepSeekV3.2) using a standardized prompt template. Teachers provided a binary label (GenAI-generated vs. student-written) and briefly explained the basis for each decision. Hypothesis: because authorship identification is an equivocal, high-uncertainty task (especially in CSL writing), teachers’ classifications will be cue-driven and only modestly accurate overall, and accuracy will be shaped more by teachers’ GenAI use proficiency and by text/model characteristics than by teaching seniority. The results align with this hypothesis: overall identification accuracy is close to chance (47.72%); teaching experience does not significantly improve performance; but teachers with high GenAI proficiency achieve significantly higher accuracy than other groups, and accuracy differs across score bands. At the model level, teachers identify DeepSeek-generated essays more accurately than GPT-generated essays, suggesting different models can vary in “indistinguishability.” Notable findings and interpretation: teachers most frequently justify decisions with learner-typical error cues and with “GenAI-like” quality/style cues (e.g., overly neat structure, overly clear logic, overly fluent style), plus references to personal experience details as authenticity signals. Analytically, the table can be used to compute accuracy and error rates (overall, by model, and by band), test predictors using teacher background variables, and code rationales to quantify cue usage and relate cue profiles to correct/incorrect judgments—supporting transparent reuse and replication.
创建时间:
2026-02-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作