five

MultiClaim dataset

收藏
arXiv2025-05-28 更新2025-11-28 收录
下载链接:
https://zenodo.org/records/15413169
下载链接
链接失效反馈
官方服务:
资源简介:
MultiClaim数据集是一个包含多种语言的社交媒体帖子及其相应的事实核查声明对的数据集,旨在支持专业事实核查人员的工作。该数据集包含283种语言组合,涵盖了30种语言的帖子,46种语言的事实核查声明,共有47种语言。数据集由从Google FactCheck Explorer API和自定义抓取器获取的事实核查文章中的声明,以及Meta平台(Facebook和Instagram)上的事实核查标签链接的帖子组成。数据集被划分为训练集、开发集和测试集,以确保数据的多样性和避免数据污染。

The MultiClaim dataset is a multilingual dataset comprising social media posts and their corresponding fact-checking claim pairs, designed to support the work of professional fact-checkers. This dataset includes 283 language combinations, covering posts in 30 languages and fact-checking claims in 46 languages, totaling 47 unique languages. The dataset is compiled from claims extracted from fact-checking articles obtained via the Google FactCheck Explorer API and custom crawlers, as well as posts linked to fact-checking tags on Meta platforms (Facebook and Instagram). The dataset is partitioned into training, development, and test sets to ensure data diversity and prevent data leakage.
提供机构:
Fondazione Bruno Kessler, Trento, Italy; Kempelen Institute of Intelligent Technologies, Bratislava, Slovakia
创建时间:
2025-05-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作