CSQ: A Chinese Elementary Science Question Dataset with Rich Discipline Properties in Adaptive Problem-Solving Process Generation
收藏DataCite Commons2025-07-07 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/CSQ_A_Chinese_Elementary_Science_Question_Dataset_with_Rich_Discipline_Properties_in_Adaptive_Problem-Solving_Process_Generation/28667489
下载链接
链接失效反馈官方服务:
资源简介:
Although large language models (LLMs) demonstrate significant potential for advancing personalized science education, they face challenges in generating science problem-solving processes adapted to students' grade levels. In this paper, we developed the world's largest<b> Chinese Science Question (CSQ)</b> dataset, which comprises both a benchmark and a training set, aiming to evaluate and enhance the science problem-solving capabilities of LLMs. The CSQ consists of 12,000 high-quality samples featuring a variety of question types and diverse discipline properties, covering four subjects and multiple topics at the Chinese primary school. We further designed the language model to reflect these discipline properties in the generated responses, emulating the thought process of students when solving science questions. We demonstrated that CSQ and its extensive annotations can be employed for fine-tuning models. This was confirmed through both automatic and human evaluations, particularly in <b>generating problem-solving processes that are aligned with students' grade levels</b>.@article{DongLli2025CSQ,
title={CSQ: A Chinese Elementary Science Question Dataset with Rich Discipline Properties in adaptive problem-solving process generation},
author={Zhi liu, Dong Li, Tatao Long, Chaodong Wen, Xian Peng, Jiaxin Guo},
journal={Scientific Data},
year={2025},
url={}
}
尽管大语言模型(Large Language Model,LLM)在推动个性化科学教育领域展现出显著潜力,但在生成适配学生年级水平的科学解题流程时仍面临诸多挑战。本文中,我们构建了全球规模最大的**中文科学问题(Chinese Science Question,CSQ)**数据集,该数据集同时包含基准集与训练集,旨在评估并提升大语言模型的科学解题能力。
该数据集包含12000条高质量样本,涵盖多样化题型与丰富的学科属性,覆盖中国小学阶段的4门学科与多个知识点主题。我们进一步设计了大语言模型,使其在生成的回复中体现上述学科属性,模拟学生解答科学问题时的思维过程。
我们验证了CSQ及其丰富的标注信息可用于大语言模型的微调,并通过自动评测与人工评测证实了这一点,尤其在**生成适配学生年级水平的解题流程**方面效果显著。
@article{DongLli2025CSQ,
title={CSQ: A Chinese Elementary Science Question Dataset with Rich Discipline Properties in adaptive problem-solving process generation},
author={Zhi liu, Dong Li, Tatao Long, Chaodong Wen, Xian Peng, Jiaxin Guo},
journal={Scientific Data},
year={2025},
url={}}
提供机构:
figshare
创建时间:
2025-03-26



