MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/records/15043064

下载链接

链接失效反馈

官方服务：

资源简介：

Here we provide pruned TCGA transcriptomics data from manuscript "MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention". Code is available at GitHub. The TCGA [1] transcriptomics data were collected from Xena [2] and preprocessed using the proposed novel pipeline in MIRROR [3]. For raw transcriptomics data, we first apply RFE [4] with 5-fold cross-validation for each cohort to identify the most performant support set for the subtyping task. To enhance interpretability from a biological perspective, we manually incorporate genes associated with specific cancer subtypes based on the COSMIC database [5], resulting in a one-dimensional transcriptomics feature vector. [1] K. Tomczak et al., “Review the cancer genome atlas (tcga): an immeasurable source of knowledge,” Contemporary Oncology/Wsp´ołczesna Onkologia, vol. 2015, no. 1, pp. 68–77, 2015. [2] M. J. Goldman et al., “Visualizing and interpreting cancer genomics data via the xena platform,” Nat. Biotechnol., vol. 38, no. 6, pp. 675–678, 2020. [3] Wang, Tianyi, et al. "MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention." arXiv preprint arXiv:2503.00374 (2025). [4] I. Guyon et al., “Gene selection for cancer classification using support vector machines,” Machine learning, vol. 46, pp. 389–422, 2002. [5] Z. Sondka et al., “Cosmic: a curated database of somatic variants and clinical data for cancer,” Nucleic Acids Research, vol. 52, no. D1, pp. D1210–D1217, 2024.

创建时间：

2025-03-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集