JonathanZha/PADBen-Task1
收藏Hugging Face2025-10-12 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/JonathanZha/PADBen-Task1
下载链接
链接失效反馈官方服务:
资源简介:
PADBen Task 1是一个二分类数据集,旨在区分人类作者和LLM生成的改写文本。该数据集包含16,233个句子,其中80%用于训练,20%用于测试,并且还有一个未标记的测试子集。每个样本包含一个句子和一个二进制标签,0代表人类作者,1代表机器生成。数据集采用50-50的平衡采样方法,并且格式为每个样本包含一个句子和一个二进制标签。README中还提供了数据集的使用方法和评估指标。
PADBen Task 1 is a binary classification dataset for distinguishing between human-authored and LLM-generated paraphrases. This dataset contains 16,233 sentences, with 80% for training and 20% for testing, and also includes an unlabeled test subset. Each sample consists of a sentence and a binary label, with 0 representing human authors and 1 representing machine-generated text. The dataset uses a 50-50 balanced sampling method and is formatted with a single sentence and a binary label per sample. The README provides instructions for using the dataset and evaluation metrics.
提供机构:
JonathanZha



