Histology images from uniform tumor regions in TCGA Whole Slide Images (TCGA-UT)

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/3373438

下载链接

链接失效反馈

官方服务：

资源简介：

TCGA-UT Dataset Documentation Quick Links Dataset on Hugging Face: For users interested in benchmarking foundation models or feature extractors, please visit TCGA-UT on Hugging Face Original Paper: Universal encoding of pan-cancer histology by deep texture representations Dataset Overview The TCGA-UT dataset is a large-scale collection of histopathological image patches from human cancer tissues. It contains 1,608,060 image patches extracted from hematoxylin & eosin (H&E) stained histological samples across 32 different types of solid cancers. Key Features Size: Over 1.6 million image patches Resolution: All patches are standardized to 256 x 256 pixels Source: Derived from The Cancer Genome Atlas (TCGA) dataset Quality: Curated by trained pathologists Coverage: 32 different cancer types Patient Base: 7,175 patients from 8,736 diagnostic slides Data Collection Process Image Source: Whole Slide Images (WSI) were downloaded from the GDC legacy database between December 2016 and June 2017 Expert Annotation: Two trained pathologists selected at least three representative tumor regions per slide Quality Control: 926 slides were removed due to various quality issues (poor staining, low resolution, focus problems, etc.) Patch Extraction: 10 patches were randomly cropped at 6 different magnification levels from each annotated region File Structure Files are organized using the following format: Copy [cancer_type]/[resolution]/[TCGA Barcode]/[region]-[number]-[pixel resolution].jpg Resolution Key 0: 0.5 μm/pixel 1: 0.6 μm/pixel 2: 0.7 μm/pixel 3: 0.8 μm/pixel 4: 0.9 μm/pixel 5: 1.0 μm/pixel License Non-Commercial Use: CC-BY-NC-SA 4.0 Commercial Use: Please contact ishum-prm@m.u-tokyo.ac.jp for licensing Citation If you use this dataset in your research, please cite: Copy Komura, D., et al. (2022). Universal encoding of pan-cancer histology by deep texture representations. Cell Reports 38, 110424. https://doi.org/10.1016/j.celrep.2022.110424 For Model Benchmarking If you're interested in using this dataset for benchmarking foundation models or feature extractors, we recommend accessing the dataset through the Hugging Face Hub at dakomura/tcga-ut. The Hugging Face version provides: Predefined train/validation/test splits (both internal and external facility-based splits) Ready-to-use benchmarking framework for foundation models WebDataset format support for efficient data loading Example implementations for state-of-the-art model evaluation

创建时间：

2025-02-07