five

Additional file 1 of Machine learning models outperform deep learning models, provide interpretation and facilitate feature selection for soybean trait prediction

收藏
Research Data Australia2024-08-03 收录
下载链接:
https://researchdata.edu.au/additional-file-1-trait-prediction/2024528
下载链接
链接失效反馈
官方服务:
资源简介:
Additional file 1: Supplementary Figure 1. P-value of each SNPs association for a) flower colour b) seed coat colour c) pod colour in the soybean VCF. SNPs coloured red have been determined as significantly associated for the given trait as they have a p-value less than the -log10(8) significance threshold for this GWAS. Supplementary Figure 2. Graphs ranking the top 20 most input SNPs by gain as identified by XGBoost models for trait predictions for traits with regions of importance identified from XGBoost. Blue bars are region of importance, whereas other colours represent collections of important SNPs on the same chromosome. Black bars represent left over SNPs with no relation to other SNPs in the ranking. SNP rankings for genome wide SNP input for A) flower colour B) seed coat colour C) pubescence density D) seed weight. Supplementary Figure 3. Top 20 ranked SNPs for XGBoost Seed Oil Prediction. Supplementary Figure 4. Top 20 ranked SNPs for XGBoost Pod Colour Prediction. Supplementary Figure 5. Top 20 ranked SNPs for XGBoost Seed Protein Prediction. Supplementary Table 1. Targeted Regions of SNPs for Reduced Input Models. Supplementary Table 2. List of soybean germplasm in the pangenome with the sequence coverage. (ND, not defined). Supplementary Table 3. Trait Data Types.

附加文件1: 补充图1:大豆变异识别格式(Variant Call Format,VCF)中各单核苷酸多态性(Single Nucleotide Polymorphism,SNP)与a)花色、b)种皮颜色、c)豆荚颜色的关联P值。标记为红色的SNP被确定与对应性状显著关联,因其P值低于本次全基因组关联分析(Genome-Wide Association Study,GWAS)的-log₁₀(8)显著性阈值。 补充图2:针对经极限梯度提升树(Extreme Gradient Boosting,XGBoost)模型鉴定出重要关联区域的性状预测任务,展示基于增益值排序的前20个输入SNP的柱状图。其中蓝色柱体代表重要关联区域,其余颜色柱体代表同一染色体上的重要SNP集合;黑色柱体则代表排名中与其他SNP无关联的剩余SNP。全基因组SNP输入对应的SNP排名:A)花色、B)种皮颜色、C)茸毛密度、D)种子重量。 补充图3:极限梯度提升树(XGBoost)模型用于种子含油量预测的前20位排名SNP。 补充图4:极限梯度提升树(XGBoost)模型用于豆荚颜色预测的前20位排名SNP。 补充图5:极限梯度提升树(XGBoost)模型用于种子蛋白质含量预测的前20位排名SNP。 补充表1:简化输入模型所用的SNP靶向区域。 补充表2:泛基因组中收录的大豆种质资源及其测序覆盖度信息(ND:未定义)。 补充表3:性状数据类型。
提供机构:
The University of Western Australia
二维码
社区交流群
二维码
科研交流群
商业服务