five

Dataset of histopathological image crops from GTEx project

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13330658
下载链接
链接失效反馈
官方服务:
资源简介:
This is a dataset of histological slides from the GTEx project that has been balanced for 3 major factors (organ, sex, and age bracket) that may be useful to train models in supervised or self-supervised modes. Four datasets are avaialble: gtex_histology_balanced_3_slides_200_tiles.tar.gz: Conditioned on the 3 factors, 3 slides were selected per group, and 200 tiles in tissue segmented areas selected randomly per slide. gtex_histology_balanced_3_slides_2000_tiles.tar.gz: Conditioned on the 3 factors, 3 slides were selected per group, and 2000 tiles in tissue segmented areas selected randomly per slide. gtex_histology_balanced_10_slides_100_tiles.tar.gz: Conditioned on the 3 factors, 10 slides were selected per group (when possible), and 100 tiles in tissue segmented areas selected randomly per slide. This dataset matches closely the "gtex_histology_balanced_3_slides_200_tiles.tar.gz" dataset in total number of tiles. gtex_histology_balanced_10_slides_800_tiles.tar.gz: Conditioned on the 3 factors, 10 slides were selected per group (when possible), and 800 tiles in tissue segmented areas selected randomly per slide. This dataset matches closely the "gtex_histology_balanced_3_slides_200_tiles.tar.gz" dataset in total number of tiles. Each archive file contains the following: slide_annotation.csv: a slide-level annotation of the slides (see below) train: a directory with image tiles to be used to train a model valid: a directory with image tiles to be used to validate a model The slide_annotation file contains publicly available information on the slides in addition to 3 columns: "Tissue_simple": the organ of the slide "split": whether the slide was assign the 'train' or 'valid' split for training. The validation split slides have 1/10th of the tiles from training. "n_tiles": the number of image tiles in the dataset for each slide Example: Tissue Sample ID Tissue Subject ID Sex Age Bracket Hardy Scale Pathology Categories Pathology Notes Tissue_simple split n_tiles GTEX-1128S-1426 Esophagus - Mucosa GTEX-1128S female 60-69 Fast death - natural causes   6 pieces, near- total autolysis/mucosa completely sloughed Esophagus train 200 GTEX-113JC-1226 Stomach GTEX-113JC female 50-59 Fast death - natural causes   6 pieces, well dissected mucosa; some areas are severely autolyzed Stomach valid 20 GTEX-1192W-2526 Muscle - Skeletal GTEX-1192W male 60-69 Fast death - natural causes   2 pieces, ~10-20% interstitial fat, rep foci delineated Muscle train 200 GTEX-1192X-0426 Muscle - Skeletal GTEX-1192X male 50-59 Slow death   2 pieces, 5-10% interstitial fat, rep. foci delineated Muscle valid 20 GTEX-11DXX-1326 Stomach GTEX-11DXX female 60-69 Ventilator case gastritis 6 pieces, mild chronic active gastritis Stomach train 200 Inside train and valid and JPEG files named with the following convention: ......jpg such that the origin of the crops can be traced and the file name serve as a direct class label if desired. Examples: "GTEX-ZYT6-1326.Pancreas.male.30-39.47492.16064.jpg", "GTEX-WWYW-2726.Ovary.female.50-59.5024.15008.jpg.
创建时间:
2025-02-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作