five

CGCI-BLGSP: DICOM converted whole slide images from the Cancer Genome Characterization Initiative (CGCI) - Burkitt Lymphoma Genome Sequencing Project (BLGSP)

收藏
DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17381396
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset corresponds to a collection of images and/or image-derived data available from the National Cancer Institute Imaging Data Commons (IDC). This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using the IDC Portal. You can use the manifests included in this Zenodo record to download the collection following the Download instructions below. The Office of Cancer Genomics at the National Cancer Institute sponsored a series of studies as part of the Cancer Genome Characterization Initiative (CGCI) to assess novel emerging sequencing technologies in cancer. The CGCI program included comprehensive characterization of the genetic aberrations found in different pediatric and/or adult tumors. The Burkitt Lymphoma Genome Sequencing Project (BLGSP) was a collaborative effort between the National Cancer Institute and the Foundation for Burkitt Lymphoma Research, coordinated by the Foundation for the National Institutes of Health. The goal of the BLGSP was to explore potential genetic changes in patients with Burkitt lymphoma (BL), an uncommon type of Non-Hodgkin lymphoma that occurs most often in children and young adults, that could lead to better prevention, detection, and treatment. The project characterized the alterations of the tumors' genomes (with matched normal as control) and transcriptomes by sequencing the DNA and RNA of each case. Consented subjects include adult and pediatric patients from sub-Saharan Africa, North America, Brazil, and France. This collection contains DICOM converted whole slide images from 388 subjects in the GDC CGCI-BLGSP project (dbGaP accession phs000527). The proprietary format whole slide images were obtained from GDC and converted to DICOM Slide Microscopy (SM) format using idc-wsi-conversion. The metadata about the patients and biospecimens were downloaded from the GDC portal as TSV files. The 1,933 slides include specimens stained with H&E (hematoxylin and eosin) and various immunohistochemistry markers (BCL2, BCL6, CD10, Ki-67, CD20, CD3, CD5, CD79a), as well as EBER in situ hybridization and Wright-Giemsa stains. Tissue preparation includes both formalin-fixed paraffin-embedded (FFPE) and frozen sections. The collection also includes tissue microarray (TMA) slides. Diagnoses coded in ICD-O-3 include Burkitt lymphoma NOS (254 patients), diffuse large B-cell lymphoma NOS (25), clear cell tumor NOS (6), and malignant lymphoma NOS (2). Data organization: The 388 DICOM PatientIDs comprise 292 individual case IDs (matching GDC case IDs, e.g. BLGSP-71-06-00001) and 96 TMA array identifiers (e.g. BLGSP-71-TMA1-HE). Each individual case ID corresponds to one GDC case and can be used to link to genomic, transcriptomic, and clinical data available in the GDC portal. TMA slides contain tissue cores from multiple patients on a single slide and use the array identifier as PatientID (e.g. BLGSP-71-TMA1-HE for TMA array 1, H&E stain). A mapping from TMA array numbers to individual patient case IDs is available in the "TMA Case Lists" tab of the GDC sample matrix (127 patients across 12 TMA arrays), but the spatial correspondence between tissue cores on the slide and specific patients is not known, so this mapping is not encoded in the DICOM metadata. Individual cases may have multiple slides representing different tissue sections (aliquots) and/or different stains. Additional metadata about the samples (not used during conversion) is available in the same GDC sample matrix. CGCI publications about BLGSP are listed at GDC. Files included A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, cgci_blgsp-idc_v22-aws.s5cmd corresponds to the contents of the cgci_blgsp collection introduced in IDC data release v22. cgci_blgsp-idc_v24-aws.s5cmd: AWS download manifest cgci_blgsp-idc_v24-gcs.s5cmd: GCS download manifest cgci_blgsp-idc_v24-dcf.dcf: DCF download manifest Manifest files ending in -aws.s5cmd reference files in Amazon Web Services (AWS) buckets; -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and mirrored between AWS and GCP. Download instructions Each manifest file includes instructions in its header on how to download the included files. To download the files using .s5cmd manifests: Install idc-index: pip install --upgrade idc-index Download the files referenced by a manifest included in this dataset: idc download manifest.s5cmd To download files using a .dcf manifest, see the manifest header. For questions or help, contact support@canceridc.dev or post on the IDC Forum.
提供机构:
Zenodo
创建时间:
2026-05-06
二维码
社区交流群
二维码
科研交流群
商业服务