EndLessTime/SAGE
收藏Hugging Face2025-11-08 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/EndLessTime/SAGE
下载链接
链接失效反馈官方服务:
资源简介:
SAGE(摘要对齐生成评估)数据集是一个用于评估AI生成文本检测器的基准,特别是针对大型语言模型(LLM)的跨域泛化能力。该数据集包含来自多个开源领域的人类编写的文本和AI生成的文本,并通过摘要对齐管道使AI生成文本与其人类对应文本在意义上保持一致,以减少内容上的偏见和风格上的偏差。数据集包含了Amazon评论、IvyPanda论文和Medium文章,每种类型大约有5000个文档,总计大约有45000段文本(约1350万个词汇)。
The SAGE (Summary-Aligned Generation Evaluation) dataset is a benchmark for evaluating AI-generated text detectors, especially the out-of-domain generalization of frontier large language models (LLMs). It contains both human-written and AI-generated texts from multiple open-source domains, with AI-generated texts meaning-aligned to their human counterparts using a summary-conditioning pipeline to minimize content and stylistic bias. The dataset includes Amazon Reviews, IvyPanda Essays, and Medium Articles, with each type having approximately 5,000 documents, totaling to about 45,000 passages (~13.5M tokens).
提供机构:
EndLessTime



