chehablab/MiniMSD
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/chehablab/MiniMSD
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: organ
dtype: string
- name: image
dtype: image
- name: binary_mask
dtype: image
- name: classes_mask
dtype: image
- name: volume_id
dtype: int32
- name: slice_id
dtype: int32
splits:
- name: '244'
num_bytes: 1287692805
num_examples: 51891
- name: '512'
num_bytes: 4640495381
num_examples: 51891
download_size: 5872933096
dataset_size: 5928188186
configs:
- config_name: default
data_files:
- split: '244'
path: data/244-*
- split: '512'
path: data/512-*
license: cc-by-sa-4.0
task_categories:
- image-segmentation
language:
- en
tags:
- medical
- xray
- nii
- ct
- MRI
pretty_name: Mini Medical Segmentation Decathlon
size_categories:
- 100K<n<1M
---
# Processed and Reduced Medical Segmentation Decathlon Dataset
<!-- Provide a quick summary of the dataset. -->
The miniMSD dataset is a medical image segmentation benchmark covering 10 human organs.
It is derived from the [Medical Segmentation Decathlon (MSD)](http://medicaldecathlon.com) by converting volumetric scans
from NIfTI (NII) format into serialised 2D RGB images, alongside their corresponding segmentation masks.
The dataset is provided in multiple resolution variants (244, 512), enabling easier use,
off-the-shelf accessibility, and flexible experimentation.
## Dataset Details
The dataset covers 10 human body organs, listed below.
Each organ includes up to 40 volumes, with each volume consisting of a variable number of image slices.
Each dataset entry contains the following components: the organ type, the image, a binary mask,
a detailed (multi-class) mask, a volume ID, and a slice ID.
The image, binary mask, and detailed mask are all provided as PIL images.
The binary mask contains two labels: 0 for background and 1 for the target region.
The detailed mask contains multiple labels (0, 1, 2, 3, …), where each label corresponds to a specific
anatomical structure. The mapping of label indices to structures is provided below.
| Organ | Number of Volumes | Total Slices | Avg. Slices per Volume | % of Total Slices |
|----------------|-------------------|--------------|------------------------|-------------------|
| Prostate | 32 | 1204 | 37.625 | 1.26% |
| Heart | 20 | 2271 | 113.550 | 2.38% |
| Hippocampus | 40 | 2754 | 68.850 | 2.89% |
| HepaticVessel | 40 | 5796 | 144.900 | 6.08% |
| BrainTumour | 40 | 6200 | 155.000 | 6.51% |
| Spleen | 40 | 6964 | 174.100 | 7.31% |
| Pancreas | 40 | 7068 | 176.700 | 7.42% |
| Colon | 40 | 7344 | 183.600 | 7.71% |
| Lung | 40 | 22510 | 562.750 | 23.62% |
| Liver | 40 | 33200 | 830.000 | 34.83% |
## Labels Mapping
### BrainTumour
- 0: background
- 1: necrotic / non-enhancing tumor
- 2: edema
- 3: enhancing tumor
### Heart
- 0: background
- 1: left atrium
### Liver
- 0: background
- 1: liver
- 2: tumor
### Hippocampus
- 0: background
- 1: anterior
- 2: posterior
### Prostate
- 0: background
- 1: peripheral zone
- 2: transition zone
### Lung
- 0: background
- 1: nodule
### Pancreas
- 0: background
- 1: pancreas
- 2: tumor
### HepaticVessel
- 0: background
- 1: vessel
- 2: tumor
### Spleen
- 0: background
- 1: spleen
### Colon
- 0: background
- 1: colon
## Uses
```python
from datasets import load_dataset
miniMSD244 = load_dataset("chehablaborg/miniMSD", split="244")
sample_id = 312
organ = miniMSD244[sample_id]["organ"]
image = miniMSD244[sample_id]["image"]
binary_mask = miniMSD244[sample_id]["binary_mask"]
classes_mask = miniMSD244[sample_id]["classes_mask"]
plt.imshow(image, cmap="grey")
plt.show()
```
## Citation
Please mention us in an acknowledgement [chehablab.com](https://chehablab.com) and cite the original authors of the dataset
```bib
@misc{msd2019,
title={A large annotated medical image dataset for the development and evaluation of segmentation algorithms},
author={Amber L. Simpson and Michela Antonelli and Spyridon Bakas and Michel Bilello and Keyvan Farahani and Bram van Ginneken and Annette Kopp-Schneider and Bennett A. Landman and Geert Litjens and Bjoern Menze and Olaf Ronneberger and Ronald M. Summers and Patrick Bilic and Patrick F. Christ and Richard K. G. Do and Marc Gollub and Jennifer Golia-Pernicka and Stephan H. Heckers and William R. Jarnagin and Maureen K. McHugo and Sandy Napel and Eugene Vorontsov and Lena Maier-Hein and M. Jorge Cardoso},
year={2019},
eprint={1902.09063},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/1902.09063},
}
```
## License
This work is licensed under a [Creative Commons CC BY SA License](http://creativecommons.org/licenses/by-sa/4.0/).
[](http://creativecommons.org/licenses/by-sa/4.0/)
[Chehab lab](https://chehablab.com) @ 2026
数据集信息:
特征:
- 名称:器官(organ),数据类型:字符串
- 名称:图像(image),数据类型:图像
- 名称:二值掩码(binary_mask),数据类型:图像
- 名称:多类别掩码(classes_mask),数据类型:图像
- 名称:体积ID(volume_id),数据类型:int32
- 名称:切片ID(slice_id),数据类型:int32
划分集:
- 名称:'244',字节数:1287692805,样本数:51891
- 名称:'512',字节数:4640495381,样本数:51891
下载大小:5872933096,数据集总大小:5928188186
配置:
- 配置名称:default,数据文件:
- 划分集:'244',路径:data/244-*
- 划分集:'512',路径:data/512-*
许可证:cc-by-sa-4.0
任务类别:
- 图像分割(image-segmentation)
语言:
- 英语(en)
标签:
- 医学(medical)
- X射线(xray)
- NIfTI(nii)
- CT(计算机断层扫描)
- MRI(磁共振成像)
漂亮名称:迷你医学分割十项全能数据集(Mini Medical Segmentation Decathlon)
大小类别:
- 100K<n<1M
# 经过处理与精简的医学分割十项全能数据集
<!-- 提供数据集的简要概述。 -->
miniMSD数据集是覆盖10个人体器官的医学图像分割基准数据集,其源自[医学分割十项全能(Medical Segmentation Decathlon, MSD)](http://medicaldecathlon.com),通过将NIfTI(NII)格式的容积扫描影像转换为序列化二维RGB图像,并配套对应的分割掩码。本数据集提供244、512两种分辨率变体,便于使用、即拿即用且支持灵活的实验探索。
## 数据集详情
本数据集涵盖10个人体器官,详情如下。每个器官最多包含40个体积数据,每个体积数据包含可变数量的图像切片。每条数据条目包含以下组件:器官类型、原始图像、二值掩码、精细化(多类别)掩码、体积ID以及切片ID。图像、二值掩码与多类别掩码均以PIL(Python Imaging Library)图像格式提供。
二值掩码包含两类标签:0代表背景,1代表目标区域。多类别掩码包含多类标签(0、1、2、3……),每类标签对应特定的解剖结构,标签索引与解剖结构的映射关系如下。
| 器官名称 | 体积数量 | 总切片数 | 单个体积平均切片数 | 占总切片比例 |
|--------------------|----------|----------|--------------------|--------------|
| 前列腺(Prostate) | 32 | 1204 | 37.625 | 1.26% |
| 心脏(Heart) | 20 | 2271 | 113.550 | 2.38% |
| 海马体(Hippocampus) | 40 | 2754 | 68.850 | 2.89% |
| 肝血管(HepaticVessel) | 40 | 5796 | 144.900 | 6.08% |
| 脑肿瘤(BrainTumour) | 40 | 6200 | 155.000 | 6.51% |
| 脾脏(Spleen) | 40 | 6964 | 174.100 | 7.31% |
| 胰腺(Pancreas) | 40 | 7068 | 176.700 | 7.42% |
| 结肠(Colon) | 40 | 7344 | 183.600 | 7.71% |
| 肺部(Lung) | 40 | 22510 | 562.750 | 23.62% |
| 肝脏(Liver) | 40 | 33200 | 830.000 | 34.83% |
## 标签映射
### 脑肿瘤(BrainTumour)
- 0:背景
- 1:坏死/非增强肿瘤
- 2:水肿
- 3:增强肿瘤
### 心脏(Heart)
- 0:背景
- 1:左心房
### 肝脏(Liver)
- 0:背景
- 1:肝脏
- 2:肿瘤
### 海马体(Hippocampus)
- 0:背景
- 1:前部
- 2:后部
### 前列腺(Prostate)
- 0:背景
- 1:外周带
- 2:移行带
### 肺部(Lung)
- 0:背景
- 1:结节
### 胰腺(Pancreas)
- 0:背景
- 1:胰腺
- 2:肿瘤
### 肝血管(HepaticVessel)
- 0:背景
- 1:血管
- 2:肿瘤
### 脾脏(Spleen)
- 0:背景
- 1:脾脏
### 结肠(Colon)
- 0:背景
- 1:结肠
## 使用示例
python
from datasets import load_dataset
miniMSD244 = load_dataset("chehablaborg/miniMSD", split="244")
sample_id = 312
organ = miniMSD244[sample_id]["organ"]
image = miniMSD244[sample_id]["image"]
binary_mask = miniMSD244[sample_id]["binary_mask"]
classes_mask = miniMSD244[sample_id]["classes_mask"]
plt.imshow(image, cmap="grey")
plt.show()
## 引用说明
请在致谢中提及[Chehab实验室](https://chehablab.com),并引用本数据集的原始作者:
bib
@misc{msd2019,
title={A large annotated medical image dataset for the development and evaluation of segmentation algorithms},
author={Amber L. Simpson and Michela Antonelli and Spyridon Bakas and Michel Bilello and Keyvan Farahani and Bram van Ginneken and Annette Kopp-Schneider and Bennett A. Landman and Geert Litjens and Bjoern Menze and Olaf Ronneberger and Ronald M. Summers and Patrick Bilic and Patrick F. Christ and Richard K. G. Do and Marc Gollub and Jennifer Golia-Pernicka and Stephan H. Heckers and William R. Jarnagin and Maureen K. McHugo and Sandy Napel and Eugene Vorontsov and Lena Maier-Hein and M. Jorge Cardoso},
year={2019},
eprint={1902.09063},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/1902.09063},
}
## 许可证
本作品采用[知识共享署名-相同方式共享4.0国际许可协议](http://creativecommons.org/licenses/by-sa/4.0/)进行许可。
[](http://creativecommons.org/licenses/by-sa/4.0/)
[Chehab实验室](https://chehablab.com) © 2026
提供机构:
chehablab



