PAN'25 Generative AI Detection (Task 2): Human-AI Collaborative Text Classification
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14966120
下载链接
链接失效反馈官方服务:
资源简介:
Dataset for the Generative AI Detection Task (Subtask 2) @ PAN 2025.
As large language models (LLMs) like GPT-4o, Claude 3.5, and Gemini 1.5-pro become increasingly accessible, machine-generated content is proliferating across diverse domains, including news, social media, education, and academia. These models produce highly fluent and coherent text, making them valuable for automating various writing tasks. However, their widespread use also raises concerns about misinformation, academic integrity, and content authenticity. Identifying the degree of human and machine involvement in text creation is crucial for addressing these challenges.
In this shared task, we focus on Human-AI Collaborative Text Classification, where the goal is to categorize documents that have been co-authored by humans and LLMs. Specifically, we aim to classify texts into six distinct categories based on the nature of human and machine contributions:
Fully human-written: The document is entirely authored by a human without any AI assistance.
Human-initiated, then machine-continued: A human starts writing, and an AI model completes the text.
Human-written, then machine-polished: The text is initially written by a human but later refined or edited by an AI model.
Machine-written, then machine-humanized (obfuscated): An AI generates the text, which is later modified to obscure its machine origin.
Machine-written, then human-edited: The content is generated by an AI but subsequently edited or refined by a human.
Deeply-mixed text: The document contains interwoven sections written by both humans and AI, without a clear separation.
Label Distribution:
Label Category
Train
Dev
Machine-written, then machine-humanized
91,232
10,137
Human-written, then machine-polished
95,398
12,289
Fully human-written
75,270
12,330
Human-initiated, then machine-continued
10,740
37,170
Deeply-mixed text (human + machine parts)
14,910
225
Machine-written, then human-edited
1,368
510
Total
288,918
72,661
创建时间:
2025-03-04



