five

Yashodhar29/synthetic-mirrors-human-ai-cnn-dailymail-v1

收藏
Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Yashodhar29/synthetic-mirrors-human-ai-cnn-dailymail-v1
下载链接
链接失效反馈
官方服务:
资源简介:
Synthetic Mirrors是一个多模型研究级数据集,旨在检测人类化AI生成的文本。该数据集汇集了来自多个开源和闭源LLM的AI生成内容,并与CNN DailyMail领域中的人类撰写内容配对。核心思想是将每个AI模型视为一个合成镜像,反映不同的风格和概率特征,以帮助检测器泛化到未见过的模型。数据集包含每行一个文本样本,具有id、text、label、source_type、ai_model、model_family、dataset_origin、topic、sampling_params和language等列。适用于AI生成文本检测、人类化AI内容分析、跨模型泛化研究等用途。

Synthetic Mirrors is a multi-model, research-grade dataset designed for detecting humanized AI-generated text. This dataset aggregates AI generations from multiple open-source and closed-source LLMs, paired with corresponding human-written content from the CNN DailyMail domain. The core idea is to treat each AI model as a synthetic mirror — reflecting distinct stylistic and probabilistic artifacts that help detectors generalize to unseen models. Each row corresponds to one text sample with columns including id, text, label, source_type, ai_model, model_family, dataset_origin, topic, sampling_params, and language. Intended uses include AI-generated text detection, humanized AI content analysis, cross-model generalization studies, and more.
提供机构:
Yashodhar29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作