A Twitter Dataset for Pakistani Political Discourse
收藏arXiv2023-01-16 更新2024-06-21 收录
下载链接:
https://doi.org/10.5281/zenodo.7538667
下载链接
链接失效反馈官方服务:
资源简介:
本数据集名为‘巴基斯坦政治讨论的Twitter数据集’,由香港科技大学创建,包含超过4900万条推文,收集于2022年4月政治活跃期间。数据集内容丰富,涵盖多种语言,主要为乌尔都语和英语,还包括罗马乌尔都语。创建过程利用Twitter流API实时收集,关键词包括政治领导人和相关热门话题标签。该数据集适用于研究政治偏见、审查制度、政治工程和自然语言处理等领域,特别是针对巴基斯坦Twitter用户的分析。
This dataset is named "Twitter Dataset on Pakistani Political Discourse", created by The Hong Kong University of Science and Technology. It contains over 49 million tweets, collected in April 2022 during a period of intense political activity in Pakistan. The dataset encompasses a diverse range of languages, primarily Urdu and English, as well as Roman Urdu. It was collected in real time via the Twitter Streaming API, using keywords related to political leaders and relevant trending hashtags. This dataset is suitable for research in areas including political bias, censorship, political engineering, and natural language processing, particularly for analyses of Pakistani Twitter users.
提供机构:
香港科技大学
创建时间:
2023-01-16



