Mustafaege/imdb-sentiment
收藏Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Mustafaege/imdb-sentiment
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: other
pretty_name: IMDb Sentiment (35k/5k/10k)
size_categories: 10K<n<100K
task_categories:
- text-classification
task_ids:
- sentiment-classification
annotations_creators:
- expert-generated
language_creators:
- expert-generated
multilinguality:
- monolingual
source_datasets:
- original
---
# IMDb Sentiment Classification
A curated version of the [Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) with custom train/validation/test splits optimized for model training and evaluation.
## Dataset Summary
This dataset contains **50,000 labeled movie reviews** from IMDb, each labeled as **positive (1)** or **negative (0)**. The data originates from the Stanford AI Lab's Large Movie Review Dataset, re-split into 35k/5k/10k for better validation during training.
## Splits
| Split | Samples | Positive | Negative |
|-------|---------|----------|----------|
| **train** | 35,000 | 17,500 | 17,500 |
| **validation** | 5,000 | 2,500 | 2,500 |
| **test** | 10,000 | 5,000 | 5,000 |
| **Total** | **50,000** | **25,000** | **25,000** |
The dataset is balanced — each split has roughly equal positive and negative reviews.
## Data Fields
- **`text`** (`string`): The movie review text (English).
- **`label`** (`int`): Sentiment label — `0` for negative, `1` for positive.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("Mustafaege/imdb-sentiment")
# Access splits
train_ds = ds["train"] # 35,000 samples
val_ds = ds["validation"] # 5,000 samples
test_ds = ds["test"] # 10,000 samples
# Example
print(train_ds[0])
# {'text': 'This movie was absolutely fantastic...', 'label': 1}
```
## Source
- **Original dataset**: [Stanford Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/)
- **Original HF mirror**: [stanfordnlp/imdb](https://huggingface.co/datasets/stanfordnlp/imdb)
- **Paper**: Maas et al., "Learning Word Vectors for Sentiment Analysis", ACL 2011
## Citation
```bibtex
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
url = {http://www.aclweb.org/anthology/P11-1015}
}
```
## License
The IMDb dataset is provided for academic research use. See the [original dataset page](https://ai.stanford.edu/~amaas/data/sentiment/) for licensing details.
提供机构:
Mustafaege



