RoFT
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/liamdugan/human-detection
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了超过21,000个人类标注,这些标注与错误分类相匹配,旨在研究人类识别由人类编写和机器生成文本之间转换的能力。数据集涵盖了多种文风,如新闻文章、总统演讲、来自Reddit的虚构故事以及Recipe1M+的食谱等,旨在推动未来在人类检测和评估生成文本方面的研究工作。该数据集规模庞大,拥有超过21,000个标注,其任务重点是识别人类编写文本与机器生成文本之间的边界。
This dataset comprises over 21,000 human annotations matched with misclassification cases, targeting the investigation of human ability to distinguish between human-written and machine-generated text. Covering a wide range of writing styles such as news articles, presidential speeches, fictional stories from Reddit, and recipes from Recipe1M+, this large-scale dataset focuses on the task of identifying the boundary between human-written and machine-generated text, aiming to advance future research on human detection and evaluation of generated text.
提供机构:
University of Pennsylvania



