Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms

Figshare2021-03-01 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Genomic_prediction_of_leaf_rust_resistance_to_Arabica_coffee_using_machine_learning_algorithms/14305576

下载链接

链接失效反馈

官方服务：

资源简介：

ABSTRACT Genomic selection (GS) emphasizes the simultaneous prediction of the genetic effects of thousands of scattered markers over the genome. Several statistical methodologies have been used in GS for the prediction of genetic merit. In general, such methodologies require certain assumptions about the data, such as the normality of the distribution of phenotypic values. To circumvent the non-normality of phenotypic values, the literature suggests the use of Bayesian Generalized Linear Regression (GBLASSO). Another alternative is the models based on machine learning, represented by methodologies such as Artificial Neural Networks (ANN), Decision Trees (DT) and related possible refinements such as Bagging, Random Forest and Boosting. This study aimed to use DT and its refinements for predicting resistance to orange rust in Arabica coffee. Additionally, DT and its refinements were used to identify the importance of markers related to the characteristic of interest. The results were compared with those from GBLASSO and ANN. Data on coffee rust resistance of 245 Arabica coffee plants genotyped for 137 markers were used. The DT refinements presented equal or inferior values of Apparent Error Rate compared to those obtained by DT, GBLASSO, and ANN. Moreover, DT refinements were able to identify important markers for the characteristic of interest. Out of 14 of the most important markers analyzed in each methodology, 9.3 markers on average were in regions of quantitative trait loci (QTLs) related to resistance to disease listed in the literature.

摘要基因组选择（Genomic selection, GS）旨在同时预测基因组内数千个分散标记的遗传效应。目前已有多种统计方法应用于GS以预测个体遗传性能。此类方法通常需对数据作出特定假设，例如表型值服从正态分布。为克服表型值非正态分布带来的问题，已有文献提出采用贝叶斯广义线性回归（GBLASSO）作为替代方案。另一类可选建模方法为基于机器学习的模型，代表性方法包括人工神经网络（ANN）、决策树（DT）及其衍生改进算法，如装袋（Bagging）、随机森林（Random Forest）与提升（Boosting）算法。本研究旨在利用决策树（DT）及其衍生改进算法，预测阿拉比卡咖啡对橙锈病的抗性，并鉴定与目标抗病性状相关的标记重要性。研究将上述方法的结果与贝叶斯广义线性回归（GBLASSO）及人工神经网络（ANN）的结果进行了对比分析。本研究使用了245株经137个标记完成基因分型的阿拉比卡咖啡植株的锈病抗性表型数据。结果显示，决策树的衍生改进算法的表观错误率与决策树（DT）、贝叶斯广义线性回归（GBLASSO）及人工神经网络（ANN）的结果持平或更低。此外，决策树的衍生改进算法可有效筛选出与目标性状相关的重要标记。在每种方法筛选出的前14个最重要标记中，平均有9.3个标记位于已有文献报道的与病害抗性相关的数量性状位点（QTLs）区域内。

创建时间：

2021-03-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集