gyoeng/IDC-h5patches-radiomics
收藏Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/gyoeng/IDC-h5patches-radiomics
下载链接
链接失效反馈官方服务:
资源简介:
# IDC H5 Patches + Radiomics + Cell Segmentation Dataset
This dataset provides multi-modal features extracted from the HEST benchmark (IDC subset), including:
* Patch-level H&E image data (H5 format)
* Radiomics features
* Cell segmentation outputs (CellViT)
* Visualization assets for demo and analysis
---
## 📂 Dataset Structure
```
.
├── h5patches/
├── radiomics/
└── cellsegments/
```
### 1. `h5patches/`
Whole-slide image patches stored in HDF5 format.
* `*.h5`
* Each file contains patch-level image data extracted from WSI
---
### 2. `radiomics/`
Radiomics features computed per patch.
For each sample:
```
radiomics/{sample}_features/
├── *_radiomics_features.parquet
├── *_radiomics_features_processed.parquet
├── *_radiomics_features_processed_stats.csv
├── patches_color.tar.gz
├── patches_gray.tar.gz
└── patches_mask.tar.gz
```
* `*.parquet` → main feature tables (recommended for use)
* `*_stats.csv` → feature statistics summary
* `patches_*.tar.gz` → visualization patches (color / grayscale / mask)
---
### 3. `cellsegments/`
Cell-level segmentation outputs generated using CellViT.
For each sample:
```
cellsegments/{sample}_seg/
├── h5_patch_cellvit_seg.parquet
├── h5_patch_cellvit_seg.geojson
├── h5_patch_cellvit_seg_meta.csv
├── overlay.tar.gz
└── summary.json
```
* `*.parquet` → structured segmentation data (recommended)
* `*.geojson` → polygon-based segmentation annotations
* `*_meta.csv` → metadata per patch
* `overlay.tar.gz` → visualization overlays (for demo/UI)
* `summary.json` → dataset summary
---
## 🚀 Usage
### Load radiomics features
```python
import pandas as pd
df = pd.read_parquet("radiomics/NCBI783_features/NCBI783_radiomics_features.parquet")
```
### Load segmentation data
```python
seg = pd.read_parquet("cellsegments/NCBI783_seg/h5_patch_cellvit_seg.parquet")
```
### Extract visualization patches
```bash
tar -xzf patches_color.tar.gz
tar -xzf overlay.tar.gz
```
---
## ⚠️ Notes
* Large files (`.h5`, `.parquet`, `.tar.gz`) are stored using **Git LFS**
* It is recommended to use `.parquet` instead of `.csv` for efficient loading
* Visualization assets are provided for demo purposes (e.g., Hugging Face Spaces)
---
## 📜 License
This dataset uses data derived from the HEST dataset.
Original dataset:
https://huggingface.co/datasets/MahmoodLab/hest
License:
CC BY-NC-SA 4.0
This work is also distributed under the same license.
---
## 🙏 Acknowledgements
* Mahmood Lab for the HEST dataset
* CellViT for cell segmentation
* PyRadiomics for feature extraction
提供机构:
gyoeng



