Yashodhar29/synthetic-mirrors-human-ai-cnn-dailymail-v1

Name: Yashodhar29/synthetic-mirrors-human-ai-cnn-dailymail-v1
Creator: Yashodhar29
Published: 2025-12-14 09:14:48
License: 暂无描述

Hugging Face2025-12-14 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/Yashodhar29/synthetic-mirrors-human-ai-cnn-dailymail-v1

下载链接

链接失效反馈

官方服务：

资源简介：

Synthetic Mirrors是一个多模型研究级数据集，旨在检测人类化AI生成的文本。该数据集汇集了来自多个开源和闭源LLM的AI生成内容，并与CNN DailyMail领域中的人类撰写内容配对。核心思想是将每个AI模型视为一个合成镜像，反映不同的风格和概率特征，以帮助检测器泛化到未见过的模型。数据集包含每行一个文本样本，具有id、text、label、source_type、ai_model、model_family、dataset_origin、topic、sampling_params和language等列。适用于AI生成文本检测、人类化AI内容分析、跨模型泛化研究等用途。

Synthetic Mirrors is a multi-model, research-grade dataset designed for detecting humanized AI-generated text. This dataset aggregates AI generations from multiple open-source and closed-source LLMs, paired with corresponding human-written content from the CNN DailyMail domain. The core idea is to treat each AI model as a synthetic mirror — reflecting distinct stylistic and probabilistic artifacts that help detectors generalize to unseen models. Each row corresponds to one text sample with columns including id, text, label, source_type, ai_model, model_family, dataset_origin, topic, sampling_params, and language. Intended uses include AI-generated text detection, humanized AI content analysis, cross-model generalization studies, and more.

提供机构：

Yashodhar29

5,000+

优质数据集

54 个

任务类型

进入经典数据集