Assessing the genotype-by-year effect on training set composition: scripts and dataset

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/pdrx7gdsmr

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset supports the study “Assessing the genotype-by-year effect on training set composition and alternatives to mitigate its impact on genomic selection accuracy.” The research investigates how genotype-by-year (G×Y) interactions influence genomic prediction (GP) accuracy and whether including overlapping checks and progenies in the training set can mitigate these effects. We hypothesized that limited overlap of entries across years would fail to correct for G×Y effects, reducing selection response. To test this, we conducted stochastic simulations using AlphaSimR, based on the structure of the LSU AgCenter rice breeding program. A total of 36 scenarios were simulated, combining four levels of G×Y interaction (0%, 25%, 50%, 75%) with three levels of overlap (0%, 5%, 10%) for both progenies and checks. The dataset includes: • Input parameters and configuration files for all scenarios • R scripts used for simulation and analysis • Output data containing genetic parameters: additive variance, population mean, best line performance, and prediction accuracy Results showed that stronger G×Y interactions consistently reduced GP accuracy and selection gains. Including up to 10% of checks and progenies in the training set did not significantly improve predictive performance. These findings indicate that such overlap is insufficient to account for temporal variation, especially in breeding programs with limited connectivity between cycles. The dataset enables full reproducibility and can support further research on training set optimization, modeling G×Y interactions, and evaluating selection strategies.

创建时间：

2025-06-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集