Prediction pipeline for HYPER-PARAMETER OPTIMIZATION OF DEEP LEARNING FOR GENOMIC PREDICTION: CONCEPTS, STRATEGIES, AND INFERENCE

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Prediction_pipeline_for_HYPER-PARAMETER_OPTIMIZATION_OF_DEEP_LEARNING_FOR_GENOMIC_PREDICTION_CONCEPTS_STRATEGIES_AND_INFERENCE/32029818

下载链接

链接失效反馈

官方服务：

资源简介：

Improving genetic gain yield in major food grade crops such as soybean (Glycine max L.) is one of the most sustainable manners to address the growing global demands in the long-term. Genomic selection (GS) is a powerful tool in plant breeding to accelerate the development of high-yielding crop varieties. Implementing artificial intelligence (AI) methods such as Deep Learning (DL) offer the flexibility to capture natural and hidden complex genetic patterns (epistasis, genotype-by-environment G×E interaction). This study provides a path to optimize the DL’s architecture for predicting agronomic traits in soybean. Comprehensive sets of hyper-parameters values were considered to find trends of the DL’s architecture. Systematically, for two hidden layers all combinations between the different number of levels for epochs, batch size, activation function combination, neurons per hidden layer, regularization parameter, learning rates, dropout, and inner calibrations were assessed (58,320 combinations). Two fivefold cross-validation schemes were considered (CV1: predicting new genotypes, CV2: incomplete field trials). Parametric models composed of main effects of environments (E), genotypes (L), genomic markers (G), and G×E interaction (M1: E+L; M2: E+L+G; and M3: E+L+G+G×E) were implemented for comparison. Results showed that the most important factors were the activation function combination, learning rate, number of epochs, neurons in the second hidden layer, and depending on the trait the batch size. For yield, there was an improvement of around ~20% compared to the cornerstone M2 model, the reaction-norm model (M3) was superior by ~2-15% to the DL’s best pattern. Additionally, considering the best pattern combination, a follow up study consisted of increasing the number of nodes in the second hidden layer (1 to 200). Clear positive patterns were observed increasing the number of nodes. This study provides information regarding the hyper-parameters to explore in future research to find the precise combination of factors to outperform the parametric implementations.

创建时间：

2026-04-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集