five

ArCOV-19

收藏
arXiv2021-03-14 更新2024-06-21 收录
下载链接:
https://gitlab.com/bigirqu/ArCOV-19/
下载链接
链接失效反馈
官方服务:
资源简介:
ArCOV-19是首个公开的阿拉伯语COVID-19 Twitter数据集,覆盖了从2020年1月27日至2021年1月31日的一年时间。该数据集包含约270万条推文,以及这些推文中最受欢迎子集的传播网络,包括转发和对话线程。数据集的创建旨在支持自然语言处理、信息检索和社会计算等多个领域的研究,特别是针对阿拉伯语社交媒体的内容分析。创建过程中,研究团队使用了手动构建的搜索查询和语言无关的爬虫工具来收集推文,并定期更新查询以跟踪流行关键词和话题。数据集的应用领域包括紧急管理、错误信息检测和社会分析等,旨在帮助理解和预测COVID-19相关话题在社交媒体上的发展和影响。

ArCOV-19 is the first publicly available Arabic COVID-19 Twitter dataset, covering a one-year period from January 27, 2020 to January 31, 2021. It contains approximately 2.7 million tweets, along with the propagation networks of its most popular subsets, including retweets and conversation threads. The dataset was developed to support research across multiple domains such as natural language processing, information retrieval, and social computing, with a specific focus on content analysis of Arabic social media. During the dataset creation process, the research team utilized manually constructed search queries and language-agnostic crawler tools to collect tweets, and regularly updated the queries to track trending keywords and topics. Its applicable fields include emergency management, misinformation detection and social analysis, with the goal of facilitating the understanding and prediction of the evolution and impact of COVID-19-related topics on social media platforms.
提供机构:
卡塔尔大学计算机科学与工程系
创建时间:
2020-04-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作