MM_Claims Dataset
收藏DataCite Commons2023-07-10 更新2024-07-13 收录
下载链接:
https://data.uni-hannover.de/dataset/99d876e0-3ab3-4a93-8f8d-101abea40034
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is introduced by the paper "MM-Claims: A Dataset for Multimodal Claim Detection in Social Media" If you use this dataset in your work, please cite: @inproceedings{cheema-etal-2022-mm, title = "{MM}-Claims: A Dataset for Multimodal Claim Detection in Social Media", author = {Cheema, Gullal Singh and Hakimov, Sherzod and Sittar, Abdul and M{\"u}ller-Budack, Eric and Otto, Christian and Ewerth, Ralph}, booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-naacl.72", pages = "962--979" } Information about columns in the files: 1. claim_binary: {0: 'Not a claim', 1: 'claim'} 2. claim_three: {0: 'Not a claim', '1': 'claim but not check-worthy', 2: 'check-worthy claim'} 3. claim_vis: {0: 'Not a claim', '1': 'visually-irrelevant claim', 2: 'visually-relevant claim'} Official code repository: https://github.com/TIBHannover/MM_Claims **All files were updated on 5th May 2023, with some images removed because of obscene images that were not automatically detected in the first phase.** **If you are interested in the binary task on check-worthiness estimation in multimodal claims, you can find the refined dataset with new test data released as part of the CLEF Checkthat! 2023 challenge: https://gitlab.com/checkthat_lab/clef2023-checkthat-lab/-/tree/main**
本数据集由论文《MM-Claims:面向社交媒体多模态主张(Multimodal Claim)检测的数据集》提出。若您在研究工作中使用该数据集,请引用如下文献:
@inproceedings{cheema-etal-2022-mm,
title = "MM-Claims:面向社交媒体多模态主张检测的数据集",
author = {Cheema, Gullal Singh and Hakimov, Sherzod and Sittar, Abdul and Müller-Budack, Eric and Otto, Christian and Ewerth, Ralph},
booktitle = "《计算语言学协会(Association for Computational Linguistics,ACL)研究发现:2022年北美计算语言学协会年会(NAACL 2022)》",
month = jul,
year = "2022",
address = "美国西雅图",
publisher = "计算语言学协会",
url = "https://aclanthology.org/2022.findings-naacl.72",
pages = "962--979"
}
文件字段说明如下:
1. claim_binary:标签映射为{0: "非主张", 1: "主张"}
2. claim_three:标签映射为{0: "非主张", 1: "非可核查主张", 2: "可核查主张"}
3. claim_vis:标签映射为{0: "非主张", 1: "视觉无关主张", 2: "视觉相关主张"}
官方代码仓库:https://github.com/TIBHannover/MM_Claims
**所有文件已于2023年5月5日完成更新,移除了首轮预处理阶段未被自动检测到的低俗色情图片。**
**若您关注多模态主张可核查性估计的二分类任务,可使用CLEF Checkthat! 2023挑战赛发布的含全新测试数据的精制数据集,链接为:https://gitlab.com/checkthat_lab/clef2023-checkthat-lab/-/tree/main**
提供机构:
LUIS
创建时间:
2022-07-13
搜集汇总
背景与挑战
背景概述
MM_Claims Dataset是一个专注于社交媒体多模态声明检测的数据集,提供二元、三元和视觉相关性三种分类标注。数据集经过更新,移除了不适当内容,并与CLEF Checkthat! 2023挑战相关。
以上内容由遇见数据集搜集并总结生成



