Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer hallmarks
收藏DataONE2025-10-22 更新2025-11-01 收录
下载链接:
https://search.dataone.org/view/sha256:6c87baab9f73796b044aa349d8fa17b4145fc16ef54da90fdb392d550e2e5c59
下载链接
链接失效反馈官方服务:
资源简介:
Evidence before this study
We conducted an extensive literature search using Google Scholar without language restrictions, employing search terms such as â(Predicting OR Classifying OR Annotating) and (cancer hallmarks) AND (Deep OR Machine Learning) OR (Artificial Intelligence OR AI).â Despite notable advances in molecular oncology and computational methodologies, a critical gap remains: no existing machine learning or deep learning framework comprehensively predicts cancer hallmarks from tumor biopsy samples. Current research primarily targets specific molecular pathways associated with individual hallmarks, leaving clinicians without an integrated model to interpret hallmark activity at the level of an individual tumor. Moreover, the absence of wet-lab techniques capable of annotating all cancer hallmarks in biopsy samples has further impeded progress, limiting the clinical utility of hallmark-related insights for precision oncology.
Added value of this study
Thi..., Dataset Collection and Processing
We utilized a large-scale dataset comprising 2.7 million single-cell transcriptomes derived from 14 tumor types, collected from 922 patients across 51 independent studies conducted globally. This dataset was sourced from the Weizmann Institute's 3CA repository.
Quality Control
Before generating synthetic datasets for model training, the raw single-cell transcriptomic data underwent a rigorous quality control (QC) process. Cells with over 15% mitochondrial transcript content, fewer than 200, or more than 6,000 expressed mRNA transcripts were excluded to ensure data reliability.
Gene Set Curation
Gene sets representing cancer hallmarks were compiled from multiple databases, retaining only genes identified in at least two independent sources. This selection was refined through manual literature reviews to exclude genes without direct or indirect roles in hallmark-related pathways.
Digital Scoring
Using the curated ..., , # Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer hallmarks
[https://doi.org/10.5061/dryad.zw3r228jc](https://doi.org/10.5061/dryad.zw3r228jc)
## Description of the data and file structure
### Data Description: Experimental Efforts
This dataset comprises single-cell transcriptomic data from the Weizmann 3CA repository, encompassing 2.7 million single-cell transcriptomes from 14 tumor types, collected from 922 patients across 51 global studies. The primary objective of the experimental efforts was to generate synthetic datasets for training and validating computational models to identify and analyze cancer hallmarks at the single-cell resolution.
Single-cell RNA sequencing (scRNA-seq) data underwent a rigorous quality control process to ensure reliability and biological relevance. This included exclusion criteria based on mitochondrial transcript content (>15%) and mRNA transcript counts (<200 or >6,000 transcripts). Gene sets corresponding to 10 established ca...,
创建时间:
2025-10-23



