Mustafaege/imdb-sentiment

Name: Mustafaege/imdb-sentiment
Creator: Mustafaege
Published: 2026-03-31 21:20:14
License: 暂无描述

Hugging Face2026-03-31 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Mustafaege/imdb-sentiment

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: other pretty_name: IMDb Sentiment (35k/5k/10k) size_categories: 10K<n<100K task_categories: - text-classification task_ids: - sentiment-classification annotations_creators: - expert-generated language_creators: - expert-generated multilinguality: - monolingual source_datasets: - original --- # IMDb Sentiment Classification A curated version of the [Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) with custom train/validation/test splits optimized for model training and evaluation. ## Dataset Summary This dataset contains **50,000 labeled movie reviews** from IMDb, each labeled as **positive (1)** or **negative (0)**. The data originates from the Stanford AI Lab's Large Movie Review Dataset, re-split into 35k/5k/10k for better validation during training. ## Splits | Split | Samples | Positive | Negative | |-------|---------|----------|----------| | **train** | 35,000 | 17,500 | 17,500 | | **validation** | 5,000 | 2,500 | 2,500 | | **test** | 10,000 | 5,000 | 5,000 | | **Total** | **50,000** | **25,000** | **25,000** | The dataset is balanced — each split has roughly equal positive and negative reviews. ## Data Fields - **`text`** (`string`): The movie review text (English). - **`label`** (`int`): Sentiment label — `0` for negative, `1` for positive. ## Usage ```python from datasets import load_dataset ds = load_dataset("Mustafaege/imdb-sentiment") # Access splits train_ds = ds["train"] # 35,000 samples val_ds = ds["validation"] # 5,000 samples test_ds = ds["test"] # 10,000 samples # Example print(train_ds[0]) # {'text': 'This movie was absolutely fantastic...', 'label': 1} ``` ## Source - **Original dataset**: [Stanford Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) - **Original HF mirror**: [stanfordnlp/imdb](https://huggingface.co/datasets/stanfordnlp/imdb) - **Paper**: Maas et al., "Learning Word Vectors for Sentiment Analysis", ACL 2011 ## Citation ```bibtex @InProceedings{maas-EtAl:2011:ACL-HLT2011, author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher}, title = {Learning Word Vectors for Sentiment Analysis}, booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}, month = {June}, year = {2011}, address = {Portland, Oregon, USA}, publisher = {Association for Computational Linguistics}, pages = {142--150}, url = {http://www.aclweb.org/anthology/P11-1015} } ``` ## License The IMDb dataset is provided for academic research use. See the [original dataset page](https://ai.stanford.edu/~amaas/data/sentiment/) for licensing details.

提供机构：

Mustafaege

5,000+

优质数据集

54 个

任务类型

进入经典数据集