Dataset of histopathological image crops from GTEx project
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13330658
下载链接
链接失效反馈官方服务:
资源简介:
This is a dataset of histological slides from the GTEx project that has been balanced for 3 major factors (organ, sex, and age bracket) that may be useful to train models in supervised or self-supervised modes.
Four datasets are avaialble:
gtex_histology_balanced_3_slides_200_tiles.tar.gz: Conditioned on the 3 factors, 3 slides were selected per group, and 200 tiles in tissue segmented areas selected randomly per slide.
gtex_histology_balanced_3_slides_2000_tiles.tar.gz: Conditioned on the 3 factors, 3 slides were selected per group, and 2000 tiles in tissue segmented areas selected randomly per slide.
gtex_histology_balanced_10_slides_100_tiles.tar.gz: Conditioned on the 3 factors, 10 slides were selected per group (when possible), and 100 tiles in tissue segmented areas selected randomly per slide. This dataset matches closely the "gtex_histology_balanced_3_slides_200_tiles.tar.gz" dataset in total number of tiles.
gtex_histology_balanced_10_slides_800_tiles.tar.gz: Conditioned on the 3 factors, 10 slides were selected per group (when possible), and 800 tiles in tissue segmented areas selected randomly per slide. This dataset matches closely the "gtex_histology_balanced_3_slides_200_tiles.tar.gz" dataset in total number of tiles.
Each archive file contains the following:
slide_annotation.csv: a slide-level annotation of the slides (see below)
train: a directory with image tiles to be used to train a model
valid: a directory with image tiles to be used to validate a model
The slide_annotation file contains publicly available information on the slides in addition to 3 columns:
"Tissue_simple": the organ of the slide
"split": whether the slide was assign the 'train' or 'valid' split for training. The validation split slides have 1/10th of the tiles from training.
"n_tiles": the number of image tiles in the dataset for each slide
Example:
Tissue Sample ID
Tissue
Subject ID
Sex
Age Bracket
Hardy Scale
Pathology Categories
Pathology Notes
Tissue_simple
split
n_tiles
GTEX-1128S-1426
Esophagus - Mucosa
GTEX-1128S
female
60-69
Fast death - natural causes
6 pieces, near- total autolysis/mucosa completely sloughed
Esophagus
train
200
GTEX-113JC-1226
Stomach
GTEX-113JC
female
50-59
Fast death - natural causes
6 pieces, well dissected mucosa; some areas are severely autolyzed
Stomach
valid
20
GTEX-1192W-2526
Muscle - Skeletal
GTEX-1192W
male
60-69
Fast death - natural causes
2 pieces, ~10-20% interstitial fat, rep foci delineated
Muscle
train
200
GTEX-1192X-0426
Muscle - Skeletal
GTEX-1192X
male
50-59
Slow death
2 pieces, 5-10% interstitial fat, rep. foci delineated
Muscle
valid
20
GTEX-11DXX-1326
Stomach
GTEX-11DXX
female
60-69
Ventilator case
gastritis
6 pieces, mild chronic active gastritis
Stomach
train
200
Inside train and valid and JPEG files named with the following convention: ......jpg such that the origin of the crops can be traced and the file name serve as a direct class label if desired.
Examples: "GTEX-ZYT6-1326.Pancreas.male.30-39.47492.16064.jpg", "GTEX-WWYW-2726.Ovary.female.50-59.5024.15008.jpg.
创建时间:
2025-02-25



