Data for: Machine learning for genomic and pedigree prediction in sugarcane
收藏DataONE2024-05-29 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:ace4b882bafda960b18263980b410cc8b0b0b79c164c8e7922f266465bfc4dd9
下载链接
链接失效反馈官方服务:
资源简介:
Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non-additive genetic effects is essential in genome prediction (GP) models of crops with highly heterozygous polyploid genomes. This study incorporates non-additive genetic effects and pedigree information using machine learning methods to track sugarcane breeding lines and enhance the prediction by assessing the degree of association between genotypes. This study measured the stem biomass and sugar content of 297 clones from 87 families within a breeding population used in the Japanese sugarcane breeding program. Subsequently, we conducted analyses based on the marker genotypes of 33,149 single-nucleotide polymorphisms. To validate the accuracy of GP in the population, we first predicted the prediction accuracy of the best linear unbiased prediction (BLUP) based on a genomic relationship matrix. Pre..., , , # Dataset used in the paper \"Machine Learning for Genomic and Pedigree Prediction in Sugarcane\"
[https://doi.org/10.5061/dryad.0rxwdbs8p](https://doi.org/10.5061/dryad.0rxwdbs8p)
This dataset was used in the paper to compare different methods for building genomic prediction models. Specifically, it includes phenotypic data (pheno.csv), marker genotype data (geno.csv), a genomic relationship matrix (grm.csv), a numerator relationship matrix (prm.csv), and IDs representing families for lines evaluated in the sugarcane breeding program in Japan. These data were compared using a new machine learning method called Simulated Annealing Ensemble (SAE), Random Forest, and commonly used methods for genomic prediction such as genomic BLUP (GBLUP) and Pedigree-based BLUP (PBLUP).
## Description of the data and file structure
pheno.csv: Estimated genotypic values calculated from data from multiple environments using a mixture model. This is the y value of the genomic prediction.
Columns: traits,...
甘蔗(Saccharum spp.)是全球蔗糖生产的核心支撑作物,但其高度杂合的多倍体基因组特性却严重制约了育种项目的实施效率。对于这类具有高度杂合多倍体基因组的作物而言,在基因组预测(GP, Genomic Prediction)模型中纳入非加性遗传效应是必不可少的环节。本研究通过机器学习方法整合非加性遗传效应与系谱信息,精准追踪甘蔗育种品系,并通过评估基因型间的关联程度提升预测性能。
本研究针对日本甘蔗育种计划中的育种群体,对来自87个家系的297个无性系的茎秆生物量与含糖量进行了测定。随后基于33149个单核苷酸多态性(SNP, Single Nucleotide Polymorphism)标记的基因型数据开展分析。为验证该群体中基因组预测的准确性,本研究首先基于基因组关系矩阵,对最佳线性无偏预测(BLUP, Best Linear Unbiased Prediction)的预测精度进行了预估。Pre..., , ,
# 本数据集为论文《机器学习用于甘蔗基因组与系谱预测(Machine Learning for Genomic and Pedigree Prediction in Sugarcane)》的配套数据集
[https://doi.org/10.5061/dryad.0rxwdbs8p](https://doi.org/10.5061/dryad.0rxwdbs8p)
本数据集被用于该论文中,以对比不同基因组预测模型的构建方法。具体而言,数据集包含表型数据(pheno.csv)、标记基因型数据(geno.csv)、基因组关系矩阵(grm.csv)、分子系谱关系矩阵(prm.csv),以及日本甘蔗育种计划中供试品系的家系ID。本研究采用新型机器学习方法模拟退火集成(SAE, Simulated Annealing Ensemble)、随机森林(Random Forest),以及基因组预测领域常用的基因组BLUP(GBLUP)、基于系谱的BLUP(PBLUP)对上述数据开展对比分析。
## 数据与文件结构说明
pheno.csv:通过混合模型结合多环境试验数据计算得到的估计基因型值,为基因组预测的因变量(y值)。
列依次包含:性状……
创建时间:
2024-05-30



