five

AH&AITD – Arslan’s Human and AI Text Database

收藏
DataCite Commons2025-05-24 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/AH_AITD_Arslan_s_Human_and_AI_Text_Database/29144348
下载链接
链接失效反馈
官方服务:
资源简介:
AH&amp;AITD is a comprehensive benchmark dataset designed to support the evaluation of AI-generated text detection tools. The dataset contains <b>11,580 samples</b> spanning both <b>human-written</b> and <b>AI-generated</b> content across multiple domains. It was developed to address limitations in previous datasets, particularly in terms of diversity, scale, and real-world applicability. To facilitate research in the detection of AI-generated text by providing a diverse, multi-domain dataset. This dataset enables fair benchmarking of detection tools across various writing styles and content categories.<b>Composition</b><b>1. Human-Written Samples (Total: 5,790)</b>Collected from:<b>Open Web Text</b> (2,343 samples)<b>Blogs</b> (196 samples)<b>Web Text</b> (397 samples)<b>Q&amp;A Platforms</b> (670 samples)<b>News Articles</b> (430 samples)<b>Opinion Statements</b> (1,549 samples)<b>Scientific Research Abstracts</b> (205 samples)<b>2. AI-Generated Samples (Total: 5,790)</b>Generated using:<b>ChatGPT</b> (1,130 samples)<b>GPT-4</b> (744 samples)<b>Paraphrase Models</b> (1,694 samples)<b>GPT-2</b> (328 samples)<b>GPT-3</b> (296 samples)<b>DaVinci (GPT-3.5 variant)</b> (433 samples)<b>GPT-3.5</b> (364 samples)<b>OPT-IML</b> (406 samples)<b>Flan-T5</b> (395 samples)<b>Citation:</b>Akram, A. (2023). <i>AH&amp;AITD: Arslan’s Human and AI Text Database</i>. [Dataset]. Associated with the article: <i>An Empirical Study of AI-Generated Text Detection Tools</i>. Advances in Machine Learning &amp; Artificial Intelligence, 4(2), 44–55.
提供机构:
figshare
创建时间:
2025-05-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作