CGCI-BLGSP: DICOM converted whole slide images from the Cancer Genome Characterization Initiative (CGCI) - Burkitt Lymphoma Genome Sequencing Project (BLGSP)
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17381396
下载链接
链接失效反馈官方服务:
资源简介:
This dataset corresponds to a collection of images and/or image-derived data available from the
National Cancer Institute Imaging Data Commons (IDC).
This dataset was converted into DICOM representation and ingested by the IDC team.
You can explore and visualize the corresponding images using the
IDC Portal.
You can use the manifests included in this Zenodo record to download the collection following
the Download instructions below.
The Office of Cancer Genomics at the National Cancer Institute sponsored a series of studies as part
of the Cancer Genome Characterization Initiative (CGCI) to assess novel emerging sequencing
technologies in cancer. The CGCI program included comprehensive characterization of the genetic
aberrations found in different pediatric and/or adult tumors.
The Burkitt Lymphoma Genome Sequencing Project (BLGSP) was a collaborative effort between the
National Cancer Institute and the
Foundation for Burkitt Lymphoma Research,
coordinated by the Foundation for the National Institutes of Health. The goal of the
BLGSP was to explore potential genetic changes in patients with Burkitt lymphoma (BL), an uncommon
type of Non-Hodgkin lymphoma that occurs most often in children and young adults, that could lead to
better prevention, detection, and treatment. The project characterized the alterations of the
tumors'
genomes (with matched normal as control) and transcriptomes by sequencing the DNA and RNA of each
case. Consented subjects include adult and pediatric patients from sub-Saharan Africa, North
America,
Brazil, and France.
This collection contains DICOM converted whole slide images from 388 subjects in the
GDC CGCI-BLGSP project
(dbGaP accession
phs000527).
The proprietary format whole slide images were obtained from GDC and converted to DICOM Slide
Microscopy (SM) format using
idc-wsi-conversion. The metadata about
the patients and biospecimens were downloaded from the
GDC portal as TSV files. The 1,933 slides include specimens stained with H&E (hematoxylin and
eosin) and various immunohistochemistry markers (BCL2, BCL6, CD10, Ki-67, CD20, CD3, CD5, CD79a),
as well as EBER in situ hybridization and Wright-Giemsa stains. Tissue preparation includes both
formalin-fixed paraffin-embedded (FFPE) and frozen sections. The collection also includes tissue
microarray (TMA) slides.
Diagnoses coded in ICD-O-3 include Burkitt lymphoma NOS (254 patients), diffuse large B-cell
lymphoma NOS (25), clear cell tumor NOS (6), and malignant lymphoma NOS (2).
Data organization: The 388 DICOM PatientIDs comprise 292 individual case IDs
(matching GDC case IDs, e.g. BLGSP-71-06-00001) and 96 TMA array identifiers
(e.g. BLGSP-71-TMA1-HE). Each individual case ID corresponds to one GDC case
and can be used to link to genomic, transcriptomic, and clinical data available in the
GDC portal.
TMA slides contain tissue cores from multiple patients on a single slide and use the array
identifier as PatientID (e.g. BLGSP-71-TMA1-HE for TMA array 1, H&E stain).
A mapping from TMA array numbers to individual patient case IDs is available in the
"TMA Case Lists" tab of the
GDC sample matrix (127
patients across 12 TMA arrays), but the spatial correspondence between
tissue cores on the slide and specific patients is not known, so this mapping is not encoded
in the DICOM metadata.
Individual cases may have multiple slides representing different tissue sections (aliquots)
and/or different stains.
Additional metadata about the samples (not used during conversion) is available in the same
GDC sample matrix.
CGCI publications about BLGSP are listed at
GDC.
Files included
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, cgci_blgsp-idc_v22-aws.s5cmd corresponds to the contents of the cgci_blgsp collection introduced in IDC data release v22.
cgci_blgsp-idc_v24-aws.s5cmd: AWS download manifest
cgci_blgsp-idc_v24-gcs.s5cmd: GCS download manifest
cgci_blgsp-idc_v24-dcf.dcf: DCF download manifest
Manifest files ending in -aws.s5cmd reference files in Amazon Web Services (AWS) buckets; -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and mirrored between AWS and GCP.
Download instructions
Each manifest file includes instructions in its header on how to download the included files.
To download the files using .s5cmd manifests:
Install idc-index:
pip install --upgrade idc-index
Download the files referenced by a manifest included in this dataset:
idc download manifest.s5cmd
To download files using a .dcf manifest, see the manifest header.
For questions or help, contact support@canceridc.dev
or post on the IDC Forum.
提供机构:
Zenodo
创建时间:
2026-05-06



