Four Text Datasets Used For Comparison Between Hedonometer and Azure Sentiment Analysis Tools

Name: Four Text Datasets Used For Comparison Between Hedonometer and Azure Sentiment Analysis Tools
Creator: figshare
Published: 2025-06-01 04:22:29
License: 暂无描述

DataCite Commons2025-06-01 更新2024-08-18 收录

下载链接：

https://figshare.com/articles/dataset/Four_Text_Datasets_Used_For_Comparison_Between_Hedonometer_and_Azure_Sentiment_Analysis_Tools/24539410/1

下载链接

链接失效反馈

官方服务：

资源简介：

Lexicon-based approaches to sentiment analysis of text are based on each word or lexical entry having a pre-defined<br>weight indicating its sentiment polarity. We compute sentiment for more than 150,000 English language texts drawn from 4 domains using the Hedonometer, a lexicon-based technique and Azure, a contemporary machine-learning based approach. We model differences in sentiment scores between approaches for documents in each domain using a regression and analyse the independent variables (Hedonometer lexical entries) as indicators of each word's importance and contribution to the score differences.1. Finance Data: This dataset contains 5,000 records of different financial news texts from company press reviews and news headlines.2. News Headlines Data: This dataset consists of 50,000 news headlines for the period of 8 months (November 2015 to July 2016) on four different topics: Economy, Microsoft, Obama, and Palestine.3. IMDb Dataset: This dataset consists of 50,000 reviews posted by customers on the online IMDb platform which is an International Movie Database platform.4. Twitter Dataset: This dataset consists of almost 40,000 tweets from users around the globe on every thing.5. Hedonometer Bag of Words: This is the bag of words used to perform sentiment analysis using traditional lexicon approach which consists of 10,223 words with their respective happiness score. The actual file can be downloaded from here: https://hedonometer.org/words/labMT-en-v2/6. Combined p-values results: This is the result file which was generated once we performed sentiment analysis on all the above domains and only identified words that are present in the hedonometer sheet. The sheet consists of the words and their respective happiness score and their p-values on all different domains.7. Data visualisations: This is the visualisation code base in Tableau which was used to generate visualisations.

提供机构：

figshare

创建时间：

2023-11-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集