five

akhiljoe143/twitter-sentiment-analysis

收藏
Hugging Face2026-02-06 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/akhiljoe143/twitter-sentiment-analysis
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-classification - token-classification language: - en tags: - twitter - sentiment - social - multi-class pretty_name: twitter-sentiment-analysis size_categories: - 10M<n<100M --- # 🐦 Twitter Sentiment Analysis (bdstar/twitter-sentiment-analysis) ## 🧠 Overview A **refined and merged version of Twitter text sentiment datasets**, providing a clean and well-balanced dataset for **sentiment classification** across three sentiment categories: **`positive`**, **`negative`**, and **`neutral`**. This dataset is split into three parts — **train**, **test**, and **validation** — each sourced from highly reputable open datasets. It is designed for training, evaluating, and benchmarking **NLP models** for **Twitter Sentiment Analysis** and other **social media text classification** tasks. --- ## 🗂️ Dataset Splits | Split | Source Dataset | Rows | File Size | Link | |-------|----------------|------|------------|------| | **Train** | Twitter Sentiment Dataset (3M labeled rows) | 3,142,209 | 361 MB | [Kaggle Dataset](https://www.kaggle.com/datasets/prkhrawsthi/twitter-sentiment-dataset-3-million-labelled-rows) | | **Test** | Sentiment140 Dataset | 1,600,001 | 198 MB | [Kaggle Dataset](https://www.kaggle.com/datasets/kazanova/sentiment140) | | **Validation** | MTEB Tweet Sentiment Extraction | 31,015 | 3.45 MB | [Hugging Face Dataset](https://huggingface.co/datasets/mteb/tweet_sentiment_extraction) | --- ## 🧩 Column Descriptions | Column | Type | Description | |---------|------|-------------| | **ID** | Integer | Auto-incremental unique ID for each row | | **text** | String | Tweet text content | | **label** | String | Sentiment category — one of `positive`, `negative`, or `neutral` | --- ## 📊 Dataset Summary | Property | Value | |-----------|-------| | **Total Rows** | 4,773,225 | | **Columns** | 3 | | **File Formats** | JSON / Parquet / Pandas / Polars / Croissant | | **License** | MIT | | **Author** | Md Abdullah Al Mamun | | **Year** | 2025 | | **Source** | Refined version of Twitter Sentiment Dataset | --- ## 📈 Detailed Statistics ### 🏋️‍♂️ Train Set **Source:** [Twitter Sentiment Dataset (3M labeled rows)](https://www.kaggle.com/datasets/prkhrawsthi/twitter-sentiment-dataset-3-million-labelled-rows) **File Size:** 361 MB **Rows:** 3,142,209 | Label | Count | Percentage | |--------|--------|-------------| | Positive | 1,571,104 | 50.0% | | Negative | 1,571,105 | 50.0% | --- ### 🧪 Test Set **Source:** [Sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140) **File Size:** 198 MB **Rows:** 1,600,001 | Label | Count | Percentage | |--------|--------|-------------| | Positive | 800,000 | 50.0% | | Negative | 800,001 | 50.0% | --- ### 🧭 Validation Set **Source:** [MTEB – Tweet Sentiment Extraction](https://huggingface.co/datasets/mteb/tweet_sentiment_extraction) **File Size:** 3.45 MB **Rows:** 31,015 | Label | Count | Percentage | |--------|--------|-------------| | Neutral | 12,561 | 40.5% | | Positive | 9,676 | 31.2% | | Negative | 8,778 | 28.3% | --- ## 💡 Usage Example (Python) ```python from datasets import load_dataset # Load dataset from Hugging Face dataset = load_dataset("bdstar/twitter-sentiment-analysis") # Access splits train = dataset["train"] test = dataset["test"] validation = dataset["validation"] # Display sample print(train[0]) ``` --- ## 🏷️ Citation If you use this dataset in your research or application, please cite as: ```bibtex @dataset{bdstar2025twitter, title = {Twitter Sentiment Analysis (Refined Dataset)}, author = {Md Abdullah Al Mamun}, year = {2025}, howpublished = {Hugging Face}, url = {https://huggingface.co/datasets/bdstar/twitter-sentiment-analysis} } ``` --- ## 📬 Contact For questions, improvements, or collaboration: **Author:** Md Abdullah Al Mamun 📧 **Email:** mamunbd.ruet@gmail.com 🌐 **Website:** [TechNTuts](https://techntuts.com/)
提供机构:
akhiljoe143
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作