five

Multi-generation genomic prediction of maize yield using parametric and non-parametric sparse selection indices

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.qjq2bvqgz
下载链接
链接失效反馈
官方服务:
资源简介:
Genomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5-17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant. Methods Data consist of 3722 Doubled-Haploid (DH) lines derived from biparental families developed at CIMMYT’s Maize DH facility at the Agricultural & Livestock Research Organization (KALRO) in Kiboko, Kenya. The biparental families were obtained by crossing elite inbred lines with drought-tolerant lines. The DH lines were selected from a larger population (based on the results of evaluating germination, good stand, plant type, low ear placement, and well-filled ears) for stage I multi-location yield trials conducted from 2017 to 2020. Each year, the selected DH lines were crossed with a single-cross tester from the complementary heterotic group to generate tree-way hybrids that were evaluated under well-watered (denoted as optimal) and drought conditions. Trial were planted in an alpha-lattice design with two replications and evaluated in two well-watered locations and one managed drought stress location during the 2017, 2018, 2019, and 2020 growing seasons. Grain yield (GY, tons/ha), anthesis date (AD, days) and plant height (PH, cm) traits were recorded. Plots were manually harvested and GY was corrected to a moisture of 12.5%. AD was measured from planting to the moment in which 50% of the plants shed pollen, and PH was measured between the soil surface and the flag leaf collar on five representative plants in each plot. DNA samples from leaves were sent to the Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA, for genotyping with repetitive sequences (rAmpSeq). A distortion segregation analysis was performed to a total of 5465 dominant markers coded as 0 (absence) and 1 (presence) from where a total of 61 markers were discarded at a 5% FDR. The remaining markers were filtered by minor allele frequency (MAF<0.05), leading 4612 filtered markers that were used for analyses. The adjusted means of GY, AD and PH were obtained using mixed-effects models fitted separately for each trait-environmental-condition-year combination. The Best Linear Unbiased Estimates (BLUE) of genotypes for the optimal experiments were estimated within year across the two locations by fitting models including Genotype, Location, Replicate, Block, and Genotype-by-Location interaction. Likewise, within each year, the BLUE for each trait for the single-location drought experiment was obtained through a linear model including Genotype, Replicate, and Block only. A total of ? = 3527 lines containing marker information and phenotypic information remained after quality control. The final number of lines in 2017, 2018, 2019, and 2020 are n = 901, n = 1418, n = 722, and n = 486, respectively. Data from the 2017 and 2018 cycles have been previously described and analyzed by Beyene et al. (2019) and Atanda et al. (2020).
创建时间:
2021-09-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作