IRLCov19
收藏arXiv2021-07-27 更新2024-06-21 收录
下载链接:
https://github.com/deepakuniyaliit/Covid19IRLTDataset
下载链接
链接失效反馈官方服务:
资源简介:
IRLCov19是一个包含1300万条推文的大型多语言Twitter数据集,专门针对印度地区语言,由印度理工学院罗凯瑞分校创建。该数据集收集了2020年2月至7月间与COVID-19相关的推文,涵盖12种印度地区语言。创建过程中,研究者利用Twitter的公开API和流行关键词进行数据收集,并通过去重处理确保数据质量。该数据集主要用于研究公众对疫情的反应、政策影响分析以及疫情早期检测和监控,为政府、研究机构和决策者提供宝贵的社会媒体数据资源。
IRLCov19 is a large multilingual Twitter dataset consisting of 13 million tweets, specifically targeting Indian regional languages, and was developed by the Indian Institute of Technology Roorkee. This dataset comprises COVID-19-related tweets collected between February and July 2020, covering 12 distinct Indian regional languages. During the data curation process, researchers utilized Twitter's public API and trending keywords to gather the data, and implemented deduplication procedures to ensure data quality. Primarily intended for research on public pandemic responses, policy impact analysis, and early detection and monitoring of COVID-19, this dataset provides valuable social media data resources for governments, research institutions, and policymakers.
提供机构:
印度理工学院罗凯瑞分校
创建时间:
2021-07-27



