DataSheet1_RFtest: A Robust and Flexible Community-Level Test for Microbiome Data Powerfully Detects Phylogenetically Clustered Signals.docx
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/DataSheet1_RFtest_A_Robust_and_Flexible_Community-Level_Test_for_Microbiome_Data_Powerfully_Detects_Phylogenetically_Clustered_Signals_docx/18996479
下载链接
链接失效反馈官方服务:
资源简介:
Random forest is considered as one of the most successful machine learning algorithms, which has been widely used to construct microbiome-based predictive models. However, its use as a statistical testing method has not been explored. In this study, we propose “Random Forest Test” (RFtest), a global (community-level) test based on random forest for high-dimensional and phylogenetically structured microbiome data. RFtest is a permutation test using the generalization error of random forest as the test statistic. Our simulations demonstrate that RFtest has controlled type I error rates, that its power is superior to competing methods for phylogenetically clustered signals, and that it is robust to outliers and adaptive to interaction effects and non-linear associations. Finally, we apply RFtest to two real microbiome datasets to ascertain whether microbial communities are associated or not with the outcome variables.
随机森林(Random Forest)被视作当前最成功的机器学习算法之一,已被广泛用于构建基于微生物组的预测模型。然而,其作为统计检验方法的应用价值尚未得到充分探索。本研究提出「随机森林检验(Random Forest Test,RFtest)」,一种基于随机森林的全局(群落水平)检验方法,适用于高维且具有系统发育结构的微生物组数据。RFtest以随机森林的泛化误差作为检验统计量,属于置换检验范畴。本研究通过模拟实验证实,RFtest可有效控制一类错误率,针对系统发育聚集信号的检验效能优于同类竞争方法,且对异常值具备鲁棒性,同时能够适配交互效应与非线性关联场景。最后,本研究将RFtest应用于两个真实微生物组数据集,以验证微生物群落是否与结局变量存在关联。
创建时间:
2022-01-24



