five

Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models

收藏
DataONE2024-04-11 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:c7b530cd18e7c628b0cb61ddcff2710ba9e9dc38ffe80b919e2481b914e672e2
下载链接
链接失效反馈
官方服务:
资源简介:
Data scarcity presents a significant obstacle in the field of biomedicine, where acquiring diverse and sufficient datasets can be costly and challenging. Synthetic data generation offers a potential solution to this problem by expanding dataset sizes, thereby enabling the training of more robust and generalizable machine learning models. Although previous studies have explored synthetic data generation for cancer diagnosis, they have predominantly focused on single-modality settings, such as whole-slide image tiles or RNA-Seq data. To bridge this gap, we propose a novel approach, RNA-Cascaded-Diffusion-Model or RNA-CDM, for performing RNA-to-image synthesis in a multi-cancer context, drawing inspiration from successful text-to-image synthesis models used in natural images. In our approach, we employ a variational auto-encoder to reduce the dimensionality of a patient’s gene expression profile, effectively distinguishing between different types of cancer. Subsequently, we employ a cascad..., , , # RNA-CDM Generated One Million Synthetic Images [https://doi.org/10.5061/dryad.6djh9w174](https://doi.org/10.5061/dryad.6djh9w174) One million synthetic digital pathology images were generated using the RNA-CDM model presented in the paper \"RNA-to-image multi-cancer synthesis using cascaded diffusion models\". ## Description of the data and file structure There are ten different h5 files per cancer type (TCGA-CESC, TCGA-COAD, TCGA-KIRP, TCGA-GBM, TCGA-LUAD). Each h5 file contains 20.000 images. The key is the tile number, ranging from 0-20,000 in the first file, and from 180,000-200,000 in the last file. The tiles are saved as numpy arrays. ## Code/Software The code used to generate this data is available under academic license in [https://rna-cdm.stanford.edu](https://rna-cdm.stanford.edu) . ## Manuscript citation Carrillo-Perez, F., Pizurica, M., Zheng, Y. *et al.* Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models...
创建时间:
2025-07-30
二维码
社区交流群
二维码
科研交流群
商业服务