Benchmarking parametric and machine learning models for genomic prediction of complex traits
收藏DataCite Commons2025-05-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.xksn02vb9
下载链接
链接失效反馈官方服务:
资源简介:
The usefulness of genomic prediction in crop and livestock breeding
programs has prompted efforts to develop new and improved genomic
prediction algorithms, such as artificial neural networks and gradient
tree boosting. However, the performance of these algorithms has not been
compared in a systematic manner using a wide range of datasets and models.
Using data of 18 traits across six plant species with different marker
densities and training population sizes, we compared the performance of
six linear and six non-linear algorithms. First, we found that
hyperparameter selection was necessary for all non-linear algorithms and
that feature selection prior to model training was critical for artificial
neural networks when the markers greatly outnumbered the number of
training lines. Across all species and trait combinations, no one
algorithm performed best, however predictions based on a combination of
results from multiple algorithms (i.e. ensemble predictions) performed
consistently well. While linear and non-linear algorithms performed best
for a similar number of traits, the performance of non-linear algorithms
vary more between traits. Although artificial neural networks did not
perform best for any trait, we identified strategies (i.e. feature
selection, seeded starting weights) that boosted their performance to near
the level of other algorithms. Our results highlight the importance of
algorithm selection for the prediction of trait values.
提供机构:
Dryad
创建时间:
2019-10-04



