five

A Multi-level Gradient Rewriting Dataset of Chinese Academic Paper Abstracts for AIGC Detection MGRD

收藏
DataCite Commons2026-04-30 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=b6ceb6d54d0e4ff98f024c6433bb4424
下载链接
链接失效反馈
官方服务:
资源简介:
To support research into AIGC detection, text source identification and academic integrity within the context of Chinese academic writing, this study has constructed the MGRD (A Multi-level Gradient Rewriting Dataset of Chinese Academic Paper Abstracts for AIGC Detection) dataset. MGRD utilises abstracts from Chinese core journal articles indexed by CNKI and Wanfang Data between 2010 and 2022 as human-generated text sources, covering three disciplinary areas: computer technology, architectural theory and Chinese drama. Based on the original abstracts, five large language models—glm-4.5-air, glm-4.6v, qwen3-14b, deepseek-R1 and gpt-4o-mini—were utilised to generate AIGC samples at three levels: light polishing, moderate rewriting and heavy rewriting. The heavy rewriting samples were generated independently based solely on the paper titles and keywords. Following rule-based filtering, hierarchical constraints, removal of anomalous samples, semantic consistency verification, perplexity analysis and blind expert sampling, four data files were generated: light_paired.csv, medium_paired.csv, heavy_paired.csv and all_paired.csv, comprising 6,011 pairs of light samples, 5,943 pairs of medium samples, and 6,343 pairs of heavy samples, totalling 18,297 pairs and 36,594 text entries. The dataset retains two core fields—`text` and `label`—and provides auxiliary fields such as paper title, keywords, rewrite_level and change ratio, supporting mixed-scenario model training, cross-dataset generalisation evaluation and text source analysis. Evaluation results indicate that MGRD can serve as a foundational data resource for research into AIGC detection in Chinese academic paper abstracts.
提供机构:
Science Data Bank
创建时间:
2026-04-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作