Gene Prioritization by Compressive Data Fusion and Chaining

Name: Gene Prioritization by Compressive Data Fusion and Chaining
Creator: figshare.com
Published: 2023-05-30 00:00:00
License: 暂无描述

figshare.com2023-05-30 更新2025-03-26 收录

下载链接：

https://figshare.com/articles/dataset/_Gene_Prioritization_by_Compressive_Data_Fusion_and_Chaining_/1575190/1

下载链接

链接失效反馈

官方服务：

资源简介：

Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets.

数据整合流程将异构数据集融合于预测模型之中，然其局限在于仅能处理与目标对象类型明确相关的数据，例如基因。拼贴（Collage）是一种新型的数据融合方法，旨在基因优先级排序。该方法考虑了与预测任务具有不同关联层次的数据集，运用集体矩阵分解技术对数据进行压缩，并通过链式关联来整合数据汇编中包含的不同对象类型。拼贴基于基因与种子基因的相似性对基因进行优先级排序。我们通过将拼贴应用于Dictyostelium中细菌响应基因的优先级排序，将其作为原核生物与真核生物相互作用的新型模型系统进行测试。在4个种子基因和14个数据集的辅助下，其中只有一个与细菌响应直接相关，拼贴提出了8个候选基因，这些基因均被迅速验证为Dictyostelium对革兰氏阴性细菌响应所必需。这些发现确立了拼贴作为一种从异构且粗略相关的数据集整合中推断生物学知识的方法。

提供机构：

figshare.com

5,000+

优质数据集

54 个

任务类型

进入经典数据集