five

Fully Bayesian Analysis of RNA-seq Counts for the Detection of Gene Expression Heterosis

收藏
DataCite Commons2024-08-07 更新2024-07-27 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Fully_Bayesian_analysis_of_RNA-seq_counts_for_the_detection_of_gene_expression_heterosis/6949499/2
下载链接
链接失效反馈
官方服务:
资源简介:
Heterosis, or hybrid vigor, is the enhancement of the phenotype of hybrid progeny relative to their inbred parents. Heterosis is extensively used in agriculture, and the underlying mechanisms are unclear. To investigate the molecular basis of phenotypic heterosis, researchers search tens of thousands of genes for heterosis with respect to expression in the transcriptome. Difficulty arises in the assessment of heterosis due to composite null hypotheses and nonuniform distributions for <i>p</i>-values under these null hypotheses. Thus, we develop a general hierarchical model for count data and a fully Bayesian analysis in which an efficient parallelized Markov chain Monte Carlo algorithm ameliorates the computational burden. We use our method to detect gene expression heterosis in a two-hybrid plant-breeding scenario, both in a real RNA-seq maize dataset and in simulation studies. In the simulation studies, we show our method has well-calibrated posterior probabilities and credible intervals when the model assumed in analysis matches the model used to simulate the data. Although model misspecification can adversely affect calibration, the methodology is still able to accurately rank genes. Finally, we show that hyperparameter posteriors are extremely narrow and an empirical Bayes (eBayes) approach based on posterior means from the fully Bayesian analysis provides virtually equivalent posterior probabilities, credible intervals, and gene rankings relative to the fully Bayesian solution. This evidence of equivalence provides support for the use of eBayes procedures in RNA-seq data analysis if accurate hyperparameter estimates can be obtained. Supplementary materials for this article are available online.

杂种优势(heterosis,亦称杂交活力(hybrid vigor))指相较于自交亲本,杂交后代的表型(phenotype)得到增强的现象。该现象在农业领域应用广泛,但其背后的分子机制仍未明确。为探究表型杂种优势的分子基础,研究者需在转录组(transcriptome)的基因表达层面,对数以万计的基因开展杂种优势相关分析。然而由于复合原假设(composite null hypotheses)的存在,以及原假设下p值(p-value)分布非均匀的问题,杂种优势的评估存在诸多难点。为此,我们针对计数数据构建了通用分层模型,并提出完整的贝叶斯(Bayesian)分析框架:其中采用高效并行化马尔可夫链蒙特卡洛(Markov Chain Monte Carlo, MCMC)算法以缓解计算负担。我们将该方法应用于双杂交植物育种场景下的基因表达杂种优势检测,既分析了真实的玉米RNA测序(RNA-seq)数据集,也开展了模拟仿真研究。模拟研究结果显示:当分析所采用的模型与数据生成模型一致时,我们的方法所得到的后验概率(posterior probability)与可信区间(credible interval)校准良好。即便模型误设会对校准效果产生不利影响,该方法仍可实现对基因的精准排序。最后,我们发现超参数(hyperparameter)的后验分布极窄,且基于完整贝叶斯分析后验均值构建的经验贝叶斯(empirical Bayes, eBayes)方法,其得到的后验概率、可信区间以及基因排序结果,与完整贝叶斯分析的结果几乎完全一致。这一等价性的相关证据表明,若能获得精准的超参数估计值,即可在RNA-seq数据分析中采用eBayes方法。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2018-11-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作