CPM-CNews
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/TsinghuaAI/CPM-Generate
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是由CPM THUCNews模型生成,该模型在THUCNews语料库的80,000个样本上进行微调,以实现文本生成。此外,该数据集是通过基于每个THUCNews样本的前20个标记来生成样本,并将非生成的样本作为人类编写的文本创建的。该数据集的任务是检测野生的机器生成文本。
This dataset was generated using the CPM THUCNews model, which was fine-tuned on 80,000 samples from the THUCNews corpus for text generation. Specifically, the machine-generated samples in this dataset are created based on the first 20 tokens of each THUCNews sample, while the original non-generated THUCNews samples are included as human-written texts. The core task of this dataset is to detect wild machine-generated text.
提供机构:
CPM



