five

MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/records/15043064
下载链接
链接失效反馈
官方服务:
资源简介:
Here we provide pruned TCGA transcriptomics data from manuscript "MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention". Code is available at GitHub. The TCGA [1] transcriptomics data were collected from Xena [2] and preprocessed using the proposed novel pipeline in MIRROR [3]. For raw transcriptomics data, we first apply RFE [4] with 5-fold cross-validation for each cohort to identify the most performant support set for the subtyping task. To enhance interpretability from a biological perspective, we manually incorporate genes associated with specific cancer subtypes based on the COSMIC database [5], resulting in a one-dimensional transcriptomics feature vector. [1] K. Tomczak et al., “Review the cancer genome atlas (tcga): an immeasurable source of knowledge,” Contemporary Oncology/Wsp´ołczesna Onkologia, vol. 2015, no. 1, pp. 68–77, 2015. [2] M. J. Goldman et al., “Visualizing and interpreting cancer genomics data via the xena platform,” Nat. Biotechnol., vol. 38, no. 6, pp. 675–678, 2020. [3] Wang, Tianyi, et al. "MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention." arXiv preprint arXiv:2503.00374 (2025). [4] I. Guyon et al., “Gene selection for cancer classification using support vector machines,” Machine learning, vol. 46, pp. 389–422, 2002.  [5] Z. Sondka et al., “Cosmic: a curated database of somatic variants and clinical data for cancer,” Nucleic Acids Research, vol. 52, no. D1, pp. D1210–D1217, 2024.
创建时间:
2025-03-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作