Aggregating multimodal cancer data across unaligned embedding spaces maintains tumor of origin signal
收藏DataONE2025-12-16 更新2025-12-20 收录
下载链接:
https://search.dataone.org/view/sha256:4cf3ed3d96da0ac5b13272c8436b86f8039a58b8738d658eb69caf96c8dd2170
下载链接
链接失效反馈官方服务:
资源简介:
AI based embeddings offer the possibilities of encoding complex biological data into low dimensional spaces, called embedding spaces, that maintain the relationships between entities. There is an open question about the compatibility of embedding spaces that are created without any coordination. It has been assumed that signals in these unaligned embedding spaces would be destroyed if vectors were aggregated into summed values. We trained embedding models across different data modalities and tested aggregating the values together to test this assumption. Our research shows that signal from unaligned embedded values is conserved and able to still be used for learning tasks, such as data modality and tumor of origin recognition.
基于人工智能的嵌入(embedding)技术可将复杂生物数据编码至低维空间,该空间即嵌入空间,能够保留实体间的关联关系。目前存在一项开放性问题:未经任何协调手段生成的嵌入空间之间是否具备兼容性。过往研究曾假设,若将未对齐嵌入空间中的向量以求和形式聚合,其中蕴含的信号将会遭到破坏。为此,我们针对不同数据模态训练了多款嵌入模型,并通过向量值聚合实验验证了这一假设。本研究结果表明,未对齐嵌入向量中蕴含的信号并未丢失,仍可用于数据模态识别、肿瘤起源识别等机器学习任务。
创建时间:
2025-12-19



