five

Arabic Tweets on Corona

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/DocNow/twarc
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了2020年2月和3月收集的含有“Corona”一词的阿拉伯语推文,其中既有被删除的推文,也有未被删除的推文,并且这些推文都根据详细的虚假信息类别进行了标注。此外,数据集还包括了一个由10万条随机抽取的推文组成的样本,其中4万条被手动标注为被删除和未被删除的类别,以深入了解推文被删除的原因。在规模上,该数据集共收集了1880万条推文,并对其中4万条样本进行了标注。这项任务的目的是将推文分类到虚假信息类别中,包括仇恨言论、攻击性内容、谣言和垃圾信息。

This dataset comprises Arabic tweets containing the word "Corona" collected in February and March 2020, including both deleted and non-deleted tweets. All these tweets have been annotated with detailed misinformation categories. In addition, the dataset includes a sample of 100,000 randomly selected tweets, among which 40,000 have been manually annotated with deleted and non-deleted labels to provide in-depth insights into the reasons for tweet deletions. In terms of scale, a total of 18.8 million tweets have been collected for this dataset, and 40,000 of these samples have been annotated. The objective of this task is to classify tweets into misinformation categories, including hate speech, offensive content, rumors, and spam.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作