Sentiment analysis (SA) (supervised and unsupervised classification) of original Twitter data posted in English about the 10th anniversary of the 2010 Haiti Earthquake
收藏DataCite Commons2023-03-27 更新2025-04-16 收录
下载链接:
https://data.ncl.ac.uk/articles/dataset/Sentiment_analysis_SA_supervised_and_unsupervised_classification_of_original_Twitter_data_posted_in_English_about_the_10th_anniversary_of_the_2010_Haiti_Earthquake/19688040/2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the sentiment analysis (SA) of original tweets posted in English by users related to the 10th anniversary of the 2010 Haitian earthquake. Tweets are classified according to their polarity or not related. This classification includes supervised and unsupervised classification. This dataset compares the accuracy (ACC) of three tools for unsupervised text classification: a no-code machine learning (ML) classification platform: ‘MonkeyLearn’ and two trained models finetuned for SA: ‘troberta’ and ‘btweet’. These last ones are language models based on RoBERTa (https://aclanthology.org/2020.findings-emnlp.148/) and BERTweet (https://aclanthology.org/2020.emnlp-demos.2/) architecture, respectively. Both models are available in the platform: Hugging Face. The first author performed the supervised classification and trained the tweets on the MonkeyLearn platform at the tweet level using samples of 1, 5 and 10 per cent of the tweets in the dataset (excluded to test ACC in the prediction). This supervised classification is compared to the unsupervised classification performed by ‘MonkeyLearn’, ‘troberta’ and ‘btweet’. We can observe that the average confidence in the classification increase with the number of trained tweets in the case of ‘MonkeyLearn’ (0.39, 0.56 and 0.64) while the average confidence in their own classification by troberta (0.89) and btweet (0.92) is very high and higher than MonkeyLearn’s average confidence.
提供机构:
Newcastle University
创建时间:
2023-01-11



