CGCI-HTMCP-LC: DICOM converted whole slide images from the Cancer Genome Characterization Initiative (CGCI) HIV+ Tumor Molecular Characterization Project (HTMCP) - Lung Cancer
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17381428
下载链接
链接失效反馈官方服务:
资源简介:
This dataset corresponds to a collection of images and/or image-derived data available from the
National Cancer Institute Imaging Data Commons (IDC).
This dataset was converted into DICOM representation and ingested by the IDC team.
You can explore and visualize the corresponding images using the
IDC Portal.
You can use the manifests included in this Zenodo record to download the collection following
the Download instructions below.
The Office of Cancer Genomics at the National Cancer Institute sponsored a series of studies as part
of the Cancer Genome Characterization Initiative (CGCI) to assess novel emerging sequencing
technologies in cancer. The CGCI program included comprehensive characterization of the genetic
aberrations found in different pediatric and/or adult tumors.
As part of CGCI, the HIV+ Tumor Molecular Characterization Project (HTMCP) was a joint effort of the
Office of Cancer Genomics (OCG) and the Office of HIV and AIDS Malignancy (OHAM). Its goals were to
characterize HIV-associated cancers obtained from HIV-infected patients and compare them to the same
types of cancers from patients without HIV infection. Approximately 34.2 million people are living
with HIV worldwide. People infected with HIV have an elevated risk of cancer and mortality, and
cancer
is a ranking cause of death among people with HIV/AIDS. The Genome Sciences Center at the British
Columbia Cancer Agency performed whole genome sequencing of 100 cases of paired tumor and germline
DNA, along with transcriptome sequencing of HIV+ tumors.
Lung cancer incidence is significantly increased among HIV-positive patients, including those on
highly active antiretroviral therapy (HAART). Additionally, the tumor spectrum of lung cancers in
HIV-positive patients is quite different from that seen in HIV-negative cases, suggesting a
different
biological development.
This collection contains DICOM converted whole slide images from 27 of the 39 cases in the
GDC CGCI-HTMCP-LC project
(dbGaP accession
phs000530).
The proprietary format whole slide images were obtained from GDC and converted to DICOM Slide
Microscopy (SM) format using
idc-wsi-conversion. The 84 slides
include specimens stained with H&E (hematoxylin and eosin,
27 slides), P40 (26), TTF1 (26), Chromogranin (2), Synaptophysin (2), and P16 (1).
Diagnoses include adenocarcinoma NOS (9 patients), squamous cell carcinoma NOS (4), non-small cell
carcinoma (3), squamous cell carcinoma large cell nonkeratinizing (2), and other rare subtypes
including adenosquamous carcinoma, mucinous adenocarcinoma, large cell neuroendocrine carcinoma,
bronchiolo-alveolar carcinoma, carcinoma undifferentiated, and neoplasm uncertain whether benign
or malignant (1 each).
Data organization: DICOM PatientIDs correspond to GDC case IDs and can be used to
link to genomic, transcriptomic, and clinical data in the
GDC portal. Of 39 GDC
cases, 27 have Tissue Slide images; the remaining 12 have no slides and are not represented
in this collection. Most patients have 3 slides (H&E, P40, TTF1), with Chromogranin,
Synaptophysin, or P16 for small subsets.
HTMCP-LC data is accessible at the NCI's Genomic Data Commons (GDC) via the
GDC Data Portal.
Please see the CGCI Use and Publication Guidelines for updated details on the sharing of any CGCI
substudy data, including how to cite CGCI.
Files included
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, cgci_htmcp_lc-idc_v22-aws.s5cmd corresponds to the contents of the cgci_htmcp_lc collection introduced in IDC data release v22.
cgci_htmcp_lc-idc_v24-aws.s5cmd: AWS download manifest
cgci_htmcp_lc-idc_v24-gcs.s5cmd: GCS download manifest
cgci_htmcp_lc-idc_v24-dcf.dcf: DCF download manifest
Manifest files ending in -aws.s5cmd reference files in Amazon Web Services (AWS) buckets; -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and mirrored between AWS and GCP.
Download instructions
Each manifest file includes instructions in its header on how to download the included files.
To download the files using .s5cmd manifests:
Install idc-index:
pip install --upgrade idc-index
Download the files referenced by a manifest included in this dataset:
idc download manifest.s5cmd
To download files using a .dcf manifest, see the manifest header.
For questions or help, contact support@canceridc.dev
or post on the IDC Forum.
提供机构:
Zenodo
创建时间:
2026-05-06



