adrianrm/breastmnist
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/adrianrm/breastmnist
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-classification
tags:
- medical
- medmnist
- breastmnist
configs:
- config_name: train-all-res224
data_files:
- split: train
path: train-all-res224/*.parquet
- config_name: train-malignant-res224
data_files:
- split: train
path: train-malignant-res224/*.parquet
- config_name: train-normal_benign-res224
data_files:
- split: train
path: train-normal_benign-res224/*.parquet
- config_name: val-all-res224
data_files:
- split: val
path: val-all-res224/*.parquet
- config_name: val-malignant-res224
data_files:
- split: val
path: val-malignant-res224/*.parquet
- config_name: val-normal_benign-res224
data_files:
- split: val
path: val-normal_benign-res224/*.parquet
- config_name: test-all-res224
data_files:
- split: test
path: test-all-res224/*.parquet
- config_name: test-malignant-res224
data_files:
- split: test
path: test-malignant-res224/*.parquet
- config_name: test-normal_benign-res224
data_files:
- split: test
path: test-normal_benign-res224/*.parquet
---
# breastmnist (MedMNIST)
**Source:** [breastmnist](https://medmnist.com/)
**Task:** binary-class
**Resolutions:** 224x224
**License:** CC BY 4.0
## Description
The BreastMNIST is based on a dataset of 780 breast ultrasound images. It is categorized into 3
classes: normal, benign, and malignant. As we use low-resolution images, we simplify the task into
binary classification by combining normal and benign as positive and classifying them against
malignant as negative. We split the source dataset with a ratio of 7:1:2 into training, validation
and test set. The source images of 1×500×500 are resized into 1×28×28.
## Config naming convention
```
{split}-{class}-{res}
split : train | val | test
class : all | <sanitized class name>
res : res28 | res64 | res128 | res224
```
## Loading examples
```python
from datasets import load_dataset
# All training images at 224px
ds = load_dataset('.../breastmnist', 'train-all-res224', split='train')
# Only 'malignant' class, training split
ds = load_dataset('.../breastmnist', 'train-malignant-res224', split='train')
```
## Class labels
- `0` — malignant (config key: `malignant`)
- `1` — normal, benign (config key: `normal_benign`)
## Class distribution
### 224x224
**train** (N=546, IR=2.71x)
| Class | Config key | Count | Share |
|-------|-----------|------:|------:|
| malignant | `malignant` | 147 | 26.9% |
| normal, benign | `normal_benign` | 399 | 73.1% |
**val** (N=78, IR=2.71x)
| Class | Config key | Count | Share |
|-------|-----------|------:|------:|
| malignant | `malignant` | 21 | 26.9% |
| normal, benign | `normal_benign` | 57 | 73.1% |
**test** (N=156, IR=2.71x)
| Class | Config key | Count | Share |
|-------|-----------|------:|------:|
| malignant | `malignant` | 42 | 26.9% |
| normal, benign | `normal_benign` | 114 | 73.1% |
## Citation
```bibtex
@article{medmnistv2,
title={MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification},
author={Yang, Jiancheng and Shi, Rui and Wei, Donglai and Liu, Zequan
and Zhao, Lin and Ke, Bilian and Pfister, Hanspeter and Ni, Bingbing},
journal={Scientific Data},
year={2023}
}
```
提供机构:
adrianrm



