IntelLabs/AI-Peer-Review-Detection-Benchmark
收藏Hugging Face2025-05-27 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/IntelLabs/AI-Peer-Review-Detection-Benchmark
下载链接
链接失效反馈官方服务:
资源简介:
AI同行评审检测基准数据集是目前为止最大的包含人工和AI编写的针对相同研究论文的成对同行评审数据集。它由两个领先的人工智能研究会议:ICLR和NeurIPS的8年的提交论文中生成的788,984条评审组成。每个AI生成的评审都是使用五种广泛使用的大型语言模型(LLM)之一生成的,包括GPT-4o、Claude Sonnet 3.5、Gemini 1.5 Pro、Qwen 2.5 72B和Llama 3.1 70B,并与相应的人工编写的评审配对。数据集包括多个子集(校准、测试、扩展),以支持对AI生成文本检测方法的系统评估。
The AI Peer Review Detection Benchmark dataset is the largest to date of paired human- and AI-written peer reviews for identical research papers. It consists of 788,984 reviews generated for 8 years of submissions to two leading AI research conferences: ICLR and NeurIPS. Each AI-generated review is produced using one of five widely-used large language models (LLMs), including GPT-4o, Claude Sonnet 3.5, Gemini 1.5 Pro, Qwen 2.5 72B, and Llama 3.1 70B, and is paired with corresponding human-written reviews. The dataset includes multiple subsets (calibration, test, extended) to support systematic evaluation of AI-generated text detection methods.
提供机构:
IntelLabs



