five

Summary statistics for all SBM blocks.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Summary_statistics_for_all_SBM_blocks_/26399466
下载链接
链接失效反馈
官方服务:
资源简介:
Finding communities in gene co-expression networks is a common first step toward extracting biological insight from these complex datasets. Most community detection algorithms expect genes to be organized into assortative modules, that is, groups of genes that are more associated with each other than with genes in other groups. While it is reasonable to expect that these modules exist, using methods that assume they exist a priori is risky, as it guarantees that alternative organizations of gene interactions will be ignored. Here, we ask: can we find meaningful communities without imposing a modular organization on gene co-expression networks, and how modular are these communities? For this, we use a recently developed community detection method, the weighted degree corrected stochastic block model (SBM), that does not assume that assortative modules exist. Instead, the SBM attempts to efficiently use all information contained in the co-expression network to separate the genes into hierarchically organized blocks of genes. Using RNAseq gene expression data measured in two tissues derived from an outbred population of Drosophila melanogaster, we show that (a) the SBM is able to find ten times as many groups as competing methods, that (b) several of those gene groups are not modular, and that (c) the functional enrichment for non-modular groups is as strong as for modular communities. These results show that the transcriptome is structured in more complex ways than traditionally thought and that we should revisit the long-standing assumption that modularity is the main driver of the structuring of gene co-expression networks.

从基因共表达网络(gene co-expression networks)中识别社区结构,是从这类复杂数据集挖掘生物学洞见的常规首要研究步骤。绝大多数社区检测算法均默认基因会被划分为协同模块(assortative modules),即组内基因间的关联程度显著高于其与组外基因的关联程度的基因群。尽管我们有充分理由推测这类模块的存在,但采用预先假定模块必然存在的检测方法却存在风险——这类方法会直接忽略基因互作的其他潜在组织形式,进而导致关键信息遗漏。本研究旨在解答两个核心问题:其一,能否在不预先强加模块化组织假设的前提下,从基因共表达网络中识别出具有生物学意义的社区结构?其二,这些识别出的社区结构本身的模块化程度如何? 为此,我们采用了一种新近提出的社区检测方法——加权度校正随机块模型(weighted degree corrected stochastic block model, SBM),该方法无需预先假定协同模块的存在。与之不同,该SBM方法旨在充分利用共表达网络中的全部信息,将基因划分为具备层级结构的基因块。 我们利用源自黑腹果蝇(Drosophila melanogaster)远交种群的两种组织的RNA测序(RNA-seq)基因表达数据开展研究,结果显示:(a) SBM识别出的基因群组数量是其他同类竞争检测方法的10倍;(b) 其中部分基因群组并不具备模块化特征;(c) 非模块化群组的功能富集程度与模块化社区相当。 上述研究结果表明,转录组(transcriptome)的组织方式远比以往认知更为复杂;同时也提示我们,应当重新审视长期以来的核心假设——模块化是驱动基因共表达网络结构形成的主要因素。
创建时间:
2024-07-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作