Four Text Datasets Used For Comparison Between Hedonometer and Azure Sentiment Analysis Tools
收藏Figshare2023-11-09 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Four_Text_Datasets_Used_For_Comparison_Between_Hedonometer_and_Azure_Sentiment_Analysis_Tools/24539410
下载链接
链接失效反馈官方服务:
资源简介:
Lexicon-based approaches to sentiment analysis of text are based on each word or lexical entry having a pre-definedweight indicating its sentiment polarity. We compute sentiment for more than 150,000 English language texts drawn from 4 domains using the Hedonometer, a lexicon-based technique and Azure, a contemporary machine-learning based approach. We model differences in sentiment scores between approaches for documents in each domain using a regression and analyse the independent variables (Hedonometer lexical entries) as indicators of each word's importance and contribution to the score differences.1. Finance Data: This dataset contains 5,000 records of different financial news texts from company press reviews and news headlines.2. News Headlines Data: This dataset consists of 50,000 news headlines for the period of 8 months (November 2015 to July 2016) on four different topics: Economy, Microsoft, Obama, and Palestine.3. IMDb Dataset: This dataset consists of 50,000 reviews posted by customers on the online IMDb platform which is an International Movie Database platform.4. Twitter Dataset: This dataset consists of almost 40,000 tweets from users around the globe on every thing.5. Hedonometer Bag of Words: This is the bag of words used to perform sentiment analysis using traditional lexicon approach which consists of 10,223 words with their respective happiness score. The actual file can be downloaded from here: https://hedonometer.org/words/labMT-en-v2/6. Combined p-values results: This is the result file which was generated once we performed sentiment analysis on all the above domains and only identified words that are present in the hedonometer sheet. The sheet consists of the words and their respective happiness score and their p-values on all different domains.7. Data visualisations: This is the visualisation code base in Tableau which was used to generate visualisations.
创建时间:
2023-11-09



