five

HANSEN

收藏
arXiv2023-10-26 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/HANSEN-REPO/HANSEN
下载链接
链接失效反馈
官方服务:
资源简介:
HANSEN是一个包含17个人类数据集和AI生成口语文本数据集的大型基准,由宾夕法尼亚州立大学创建。该数据集包括精心策划的现有语音数据集和使用ChatGPT、PaLM2和Vicuna13B等大型语言模型创建的新AI生成口语文本数据集。HANSEN旨在通过评估现有的作者归属和验证方法,解决口语文本的作者分析问题,特别是在区分人类和AI生成的文本方面。数据集涵盖多种场景,包括演讲、对话和访谈,旨在解决版权纠纷、作者概况分析等应用领域的问题。

HANSEN is a large-scale benchmark consisting of 17 human and AI-generated spoken text datasets, developed by Pennsylvania State University. This benchmark incorporates both carefully curated existing speech datasets and newly created AI-generated spoken text datasets generated using large language models such as ChatGPT, PaLM2 and Vicuna13B. HANSEN is designed to tackle the authorship analysis problem of spoken text—specifically the task of distinguishing between human-written and AI-generated text—by evaluating existing authorship attribution and verification methods. The dataset covers multiple scenarios including speeches, dialogues and interviews, and is intended to address application domains such as copyright disputes and authorship profiling.
提供机构:
宾夕法尼亚州立大学, 美国
创建时间:
2023-10-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作