five

Regression models considered for GC normalization.

收藏
NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://figshare.com/articles/dataset/Regression_models_considered_for_GC_normalization_/4045509
下载链接
链接失效反馈
官方服务:
资源简介:
We explored several regression models, from simple linear models using only one input variable (genome GC content) to more complex by progressively increasing the number of terms and using two input variables (read GC content and genome GC content). While this strategy helped us find models with lower RMSE, it eventually led to overfitting and a significant increase in RMSE (the forth-degree polynomial model). However, using non-linear regression with a Gaussian exponential term significantly improved RMSE (last model). Complete results of model testing with estimates of abundance of each bacterium in the validation sets are provided in S2 Table. R output with statistics for the tested models is included in S2 File.

我们对多款回归模型展开了探究,从仅使用单输入变量(基因组GC含量(genome GC content))的简单线性模型起步,通过逐步增加模型项数并引入双输入变量(测序读段GC含量(read GC content)与基因组GC含量)来构建复杂度更高的模型。尽管该策略曾帮助我们获得更低的均方根误差(Root Mean Square Error,RMSE),但最终引发了过拟合问题,其中四阶多项式模型的RMSE出现显著升高。不过,引入高斯指数项的非线性回归模型(本次测试的最后一款模型)大幅改善了RMSE表现。验证集内各细菌丰度的模型测试完整结果详见附表S2,测试模型的统计量R语言输出结果收录于附件S2文件。
创建时间:
2016-10-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作