zhongshiting/Chinese-Student-English-Essay
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zhongshiting/Chinese-Student-English-Essay
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- text-classification
language:
- en
pretty_name: CSEE
size_categories:
- 10K<n<100K
---
# Dataset Card for Chinese Student English Essay (CSEE) Dataset
## Dataset Summary
The Chinese Student English Essay (CSEE) dataset is designed for Automated Essay Scoring (AES) tasks. It consists of 13,270 English essays written by high school students in Beijing, who are English as a Second Language (ESL) learners. These essays were collected from two final exams and correspond to two writing prompts.
Each essay is evaluated across three key dimensions by experienced teachers:
- **Content**
- **Language**
- **Structure**
We also provide **detailed scoring rubrics**, which follow the official Chinese National College Entrance Examination (Gaokao) English Writing Scoring Standards.
To protect privacy, all personally identifiable information, including student and school names, has been removed. The dataset is strictly for academic research and cannot be used for commercial purposes.
## Supported Tasks and Leaderboards
- **Task**: Automated Essay Scoring (**AES**)
- **Potential Use Cases**: Writing assessment models, Text mining, AI-assisted education
- **Similar Datasets**: ASAP, ASAP++
## Languages
- **Source Language**: English
- **Target Population**: Chinese high school students learning English as a second language (ESL)
## Dataset Structure
### Data Instances
A typical data instance is structured as follows:
```
{
"essay_id": 12345,
"prompt_id": 1,
"prompt": "Suppose you are Li Hua,",
"essay": "Dear Jim, I'm glad to write to you.",
"overall_score": 14.5,
"content_score": 6.0,
"language_score": 5.5,
"structure_score": 3.0
}
```
### Data Fields
- essay_id (int): Unique identifier for each essay
- prompt_id (int): Unique identifier for each prompt
- prompt (string): The essay prompt given in the exam
- essay (string): The full essay written by the student
- overall_score (float): overall score = content score + language score + structure score (ranging from 0.0 to 20.0, with 0.5-point increments)
- content_score (float): Teacher-assigned score for content quality (ranging from 0.0 to 8.0, with 0.5-point increments)
- language_score (float): Teacher-assigned score for language proficiency (ranging from 0.0 to 8.0, with 0.5-point increments)
- structure_score (float): Teacher-assigned score for organization and coherence (ranging from 0.0 to 4.0, with 0.5-point increments)
## Data Creation
### Source Data
- **Collection Process**: The dataset was collected from two final exams administered in high schools in Beijing.
- **Annotation**: The essays were scored by experienced teachers using the official gaokao rubrics.
- **Privacy Measures**: Personal information, such as student and school names, has been removed to protect privacy.
### License
- **License Type**: CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0)
- **Usage Restrictions**: The dataset is strictly for academic research and cannot be used for commercial purposes.
## Ethical Considerations and Limitations
- **Biases**:
- The dataset represents essays written by Chinese high school students and may not generalize to other learner populations.
- Students wrote their essays on paper during the exam. We converted them into digital text using OCR and manual verification. However, due to factors such as messy handwriting and OCR model limitations, some transcription errors may still exist.
- **Privacy**: All personally identifiable information has been removed to ensure anonymity.
- **Fairness**: Scoring criteria and teacher assessments may contain inherent subjectivity.
## Citation
Changrong Xiao, Wenxing Ma, Qingping Song, Sean Xin Xu, Kunpeng Zhang, Yufang Wang, and Qi Fu. 2025. Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs. In Proceedings of the 15th International Learning Analytics and Knowledge Conference (LAK '25). Association for Computing Machinery, New York, NY, USA, 293–305. https://doi.org/10.1145/3706468.3706507
Related Paper: [Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs](https://dl.acm.org/doi/10.1145/3706468.3706507)
GitHub Repository: [LLM-AES](https://github.com/Xiaochr/LLM-AES)
Bibtex:
```
@inproceedings{10.1145/3706468.3706507,
author = {Xiao, Changrong and Ma, Wenxing and Song, Qingping and Xu, Sean Xin and Zhang, Kunpeng and Wang, Yufang and Fu, Qi},
title = {Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs},
year = {2025},
isbn = {9798400707018},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3706468.3706507},
doi = {10.1145/3706468.3706507},
abstract = {Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable consistency, generalizability, and explainability. We propose an open-source LLM-based AES system, inspired by the dual-process theory. Our system offers accurate grading and high-quality feedback, at least comparable to that of fine-tuned proprietary LLMs, in addition to its ability to alleviate misgrading. Furthermore, we conduct human-AI co-grading experiments with both novice and expert graders. We find that our system not only automates the grading process but also enhances the performance and efficiency of human graders, particularly for essays where the model has lower confidence. These results highlight the potential of LLMs to facilitate effective human-AI collaboration in the educational context, potentially transforming learning experiences through AI-generated feedback.},
booktitle = {Proceedings of the 15th International Learning Analytics and Knowledge Conference},
pages = {293–305},
numpages = {13},
keywords = {LLM Application, Automatic Essay Scoring, AI-assisted Learning},
location = {
},
series = {LAK '25}
}
```
提供机构:
zhongshiting



