NewsCLIPpings

arXiv2025-09-30 收录

下载链接：

https://github.com/g-luo/news_clippings

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是用于检测非上下文误导信息的最大数据集，其数据来源于VisualNews数据集，包含了四大新闻媒体的新闻文章。该数据集在标签上均匀平衡，并包括了多种检索策略：文本-图像、文本-文本、人物匹配和场景匹配。其规模包括71,072个训练样本，7,024个验证样本以及7,264个测试样本。该数据集的任务是检测非上下文误导信息。

This is the largest existing dataset dedicated to detecting non-contextual misinformation. Derived from the VisualNews dataset, it includes news articles from four major news media outlets. The dataset has a balanced label distribution and incorporates multiple retrieval strategies: text-image, text-text, person matching, and scene matching. It comprises 71,072 training samples, 7,024 validation samples, and 7,264 test samples, with its core task being the detection of non-contextual misinformation.

搜集汇总

背景与挑战

背景概述

NewsCLIPpings是用于检测非上下文误导信息的大规模数据集，源自VisualNews数据集，包含四大新闻媒体的71,072个训练样本。其特点包括平衡的标签分布和文本-图像/文本等多种检索策略，适用于多种匹配任务。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集