five

RoFT

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/liamdugan/human-detection
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了超过21,000个人类标注,这些标注与错误分类相匹配,旨在研究人类识别由人类编写和机器生成文本之间转换的能力。数据集涵盖了多种文风,如新闻文章、总统演讲、来自Reddit的虚构故事以及Recipe1M+的食谱等,旨在推动未来在人类检测和评估生成文本方面的研究工作。该数据集规模庞大,拥有超过21,000个标注,其任务重点是识别人类编写文本与机器生成文本之间的边界。

This dataset comprises over 21,000 human annotations matched with misclassification cases, targeting the investigation of human ability to distinguish between human-written and machine-generated text. Covering a wide range of writing styles such as news articles, presidential speeches, fictional stories from Reddit, and recipes from Recipe1M+, this large-scale dataset focuses on the task of identifying the boundary between human-written and machine-generated text, aiming to advance future research on human detection and evaluation of generated text.
提供机构:
University of Pennsylvania
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作