A Large Dataset of Tweets on the 2023 Presidential Elections in Nigeria for Natural Language Processing Tasks

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/8347220

下载链接

链接失效反馈

官方服务：

资源简介：

The dataset contains tweets related to the 2023 presidential elections in Nigeria. The data was retrieved from the social media network, Twitter (Now X) between February 4th, 2023 and April 4th, 2023. The hashtags from the official handles and other popular hashtags endorsed and/or representing the candidates of each party were considered for retrieving election related tweets using an API from Twitter social media platform. Three major political parties in Nigeria were considered and they have been labelled as Party A, Party L and Party P in this dataset. The party or group called "General" contains tweets from the Independent National Electoral Commission (INEC) hashtags such as @inecnigeria and #2023election which is not directly for any political party. The dataset has been pre-processed lightly to make it very useful to researcher for a wide range of natural language processing tasks like sentiment analysis, topic modelling, fake news detection, emotion detection, election stance, etc. Details of the dataset collection such as hashtags, retrieved tweets, duplicates removed, and the remaining unique tweets is presented in Table 1. Table 1: Tweets collection and duplicates removal S/N Party Hash tags Retrieved tweets Duplicates tweets Unique tweets 1 X @inecnigeria #2023election 64,496 47,275 17,195 2 A #TinubuIsComing #emilokan #jagabanarmy #RenewedHope #BATKSM2023 263,870 231,036 32,832 3 L #VoteLP #NigeriaMustBeBright #PeterObiForPresident2023 #ObiDatti2023 #PeterObi 664,083 310,857 353,226 4 P #NigeriaDecides #VotePDP #AtikuOkowa2023 #FinalPushToVictory #RecoverNigeria 387,450 318,425 66,227 1,379,899 907,593 468,480 To encourage NLP tasks, we uploaded in this Version One the following files: The combined dataset with pre-processed tweets and their meta data but with removed duplicates are in the file labelled “Combined Dataset Pre-processed without duplicates.csv” General statistics on each corpus is in the file labelled “Dataset Statistics.xlsx” The preprocessed corpus from the general group with the tweet contents only is in file labelled “Preprocessed_Tweet only_GENERAL.xlsx” The preprocessed corpus from Party A with the tweet contents only is in file labelled “Preprocessed_Tweet only_Party A.xlsx” The preprocessed corpus from Party L with the tweet contents only is in file labelled “Preprocessed_Tweet only_Party L.xlsx” The preprocessed corpus from Party P with the tweet contents only is in file labelled “Preprocessed_Tweet only_Party P.xlsx” The top 100 frequent tokens are in the file labelled “Top 100 Tokens and weights.xlsx” The top frequent bigrams and their weights are in the file labelled “Top 100 Bigrams and weights.xlsx” The top frequent trigrams and their weights are in the file labelled “Top 100 Trigrams and weights.xlsx”

创建时间：

2023-09-17