bdstar/Tweets-Sentiment-Analysis

Name: bdstar/Tweets-Sentiment-Analysis
Creator: bdstar
Published: 2025-11-19 07:17:13
License: 暂无描述

Hugging Face2025-11-19 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/bdstar/Tweets-Sentiment-Analysis

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-classification - token-classification language: - en tags: - twitter - tweets - sentiment - social - multi-class pretty_name: Tweets-Sentiment-Analysis size_categories: - 10M<n<100M --- # 🐦 Tweets-Sentiment-Analysis (bdstar/Tweets-Sentiment-Analysis) ## 🧠 Overview A **refined and merged version of Tweets text sentiment datasets**, providing a clean and well-balanced dataset for **sentiment classification** across three sentiment categories: **`positive`**, **`negative`**, and **`neutral`**. This dataset is split into three parts — **train**, **test**, and **validation** — each sourced from highly reputable open datasets. It is designed for training, evaluating, and benchmarking **NLP models** for **Tweets Sentiment Analysis** and other **social media text classification** tasks. --- ## 🗂️ Dataset Splits | # | Split | Name | Negative | Neutral | Positive | % Negative | % Neutral | % Positive | Total | |---|-------------|-----------------------------------------|----------|---------|----------|------------|-----------|------------|----------| | 1 | Train | Sentiment140 (positive-sentence) | 71,462 | 233,345 | 483,261 | 9.067999 | 29.609754 | 61.322246 | 788,068 | | 2 | Train | Sentiment140 (negative-sentence) | 451,341 | 191,650 | 136,801 | 57.879665 | 24.577067 | 17.543268 | 779,792 | | 3 | Train | DailyDialog | 12,623 | 45,674 | 20,226 | 16.075545 | 58.166397 | 25.758058 | 78,523 | | 4 | Test | ChatGPT Tweets Sentiment Analysis | 194,425 | 360,060 | 295,108 | 22.884487 | 42.380293 | 34.735220 | 849,593 | | 5 | Validation | mteb-tweet_sentiment_extraction | 10,083 | 7,969 | 12,070 | 33.473873 | 26.455747 | 40.070380 | 30,122 | | | **Total** | — | **739,934** | **838,698** | **947,466** | **29.291579** | **33.201325** | **37.507096** | **2,526,098** | The possiblity value of Negative, Positive and Neutral for a text has been calculated by the model [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) --- ## 🧩 Column Descriptions | Column | Type | Description | |---------|------|-------------| | **ID** | Integer | Auto-incremental unique ID for each row | | **text** | String | Tweet text content | | **negative** | Float | Possiblity the text be a negative | | **neutral** | Float | Possiblity the text be a neutral | | **positive** | Float | Possiblity the text be a positive | | **label** | String | Sentiment category — one of `positive`, `negative`, or `neutral` | --- ## 📊 Dataset Summary | Property | Value | |-----------|-------| | **Total Rows** | 2,526,098 | | **Columns** | 6 | | **File Formats** | JSON / Parquet / Pandas / Polars / Croissant | | **License** | MIT | | **Author** | Md Abdullah Al Mamun | | **Year** | 2025 | | **Source** | Refined version of Tweets Sentiment Dataset | --- ## 💡 Usage Example (Python) ```python from datasets import load_dataset # Load dataset from Hugging Face ds = load_dataset("bdstar/Tweets-Sentiment-Analysis") # Access splits train = dataset["train"] test = dataset["test"] validation = dataset["validation"] # Display sample print(train[0]) ``` --- ## 🏷️ Citation If you use this dataset in your research or application, please cite as: ```bibtex @dataset{bdstar2025Tweets, title = {Tweets-Sentiment-Analysis}, author = {Md Abdullah Al Mamun}, year = {2025}, howpublished = {Hugging Face}, url = {https://huggingface.co/datasets/bdstar/Tweets-Sentiment-Analysis} } ``` --- ## 📬 Contact For questions, improvements, or collaboration: **Author:** Md Abdullah Al Mamun 📧 **Email:** mamunbd.ruet@gmail.com 🌐 **Website:** [TechNTuts](https://techntuts.com/) 💼 **Linkedin:** [WebRock](https://www.linkedin.com/in/webrock/)

提供机构：

bdstar

5,000+

优质数据集

54 个

任务类型

进入经典数据集