DataSheet1_Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.).docx

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://figshare.com/articles/dataset/DataSheet1_Classification_and_Regression_Models_for_Genomic_Selection_of_Skewed_Phenotypes_A_Case_for_Disease_Resistance_in_Winter_Wheat_Triticum_aestivum_L_docx/19218078

下载链接

链接失效反馈

官方服务：

资源简介：

Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) data in germplasm screening nurseries generally do not follow these assumptions. On this regard, researchers may ignore the lack of normality, transform the phenotypes, use generalized linear models, or use supervised learning algorithms and classification models with no restriction on the distribution of response variables, which are less sensitive when modeling ordinal scores. The goal of this research was to compare classification and regression genomic selection models for skewed phenotypes using stripe rust SEV and IT in winter wheat. We extensively compared both regression and classification prediction models using two training populations composed of breeding lines phenotyped in 4 years (2016–2018 and 2020) and a diversity panel phenotyped in 4 years (2013–2016). The prediction models used 19,861 genotyping-by-sequencing single-nucleotide polymorphism markers. Overall, square root transformed phenotypes using ridge regression best linear unbiased prediction and support vector machine regression models displayed the highest combination of accuracy and relative efficiency across the regression and classification models. Furthermore, a classification system based on support vector machine and ordinal Bayesian models with a 2-Class scale for SEV reached the highest class accuracy of 0.99. This study showed that breeders can use linear and non-parametric regression models within their own breeding lines over combined years to accurately predict skewed phenotypes.

绝大多数基因组预测模型均为线性回归模型，这类模型默认表型呈连续正态分布，但针对条锈病——由条形柄锈菌小麦专化型（Puccinia striiformis f. sp. tritici）引发——的抗病性通常以有序尺度和百分比例进行记录。种质资源鉴定圃（germplasm screening nurseries）中获取的病情严重度（SEV）与反应型（IT）数据通常不符合上述正态分布假设。针对这一问题，研究者可采取忽略正态性缺失、对表型进行转换、采用广义线性模型，或是使用无需限定响应变量分布的监督学习算法与分类模型——这类模型在拟合有序评分时表现更为稳健。本研究旨在以冬小麦条锈病病情严重度与反应型数据为研究对象，对比分析偏态表型下的分类与回归基因组选择模型。本研究依托两类训练群体与一个多样性种质群体（diversity panel）开展了全面的回归与分类预测模型对比：两类训练群体均由育种品系组成，分别完成了2016-2018年及2020年共4年的表型鉴定；该多样性种质群体则完成了2013-2016年共4年的表型鉴定。本研究所用预测模型共纳入19861个测序分型（genotyping-by-sequencing, GBS）单核苷酸多态性（single-nucleotide polymorphism, SNP）标记。整体而言，在所有回归与分类模型中，采用平方根转换表型的岭回归最佳线性无偏预测（ridge regression best linear unbiased prediction, RR-BLUP）模型与支持向量机回归模型，在预测准确率与相对效率方面的综合表现最优。此外，针对病情严重度采用二类尺度的支持向量机与有序贝叶斯分类系统，其分类准确率最高可达0.99。本研究表明，育种家可结合多年表型数据，在自有育种品系中采用线性与非参数回归模型，实现偏态表型的精准预测。

创建时间：

2022-02-23