five

Data from: Phased Genotyping-by-Sequencing Enhances Analysis of Genetic Diversity and Reveals Divergent Copy Number Variants in Maize

收藏
agdatacommons.nal.usda.gov2024-02-13 更新2025-03-22 收录
下载链接:
https://agdatacommons.nal.usda.gov/articles/dataset/Data_from_Phased_Genotyping-by-Sequencing_Enhances_Analysis_of_Genetic_Diversity_and_Reveals_Divergent_Copy_Number_Variants_in_Maize/24668043/1
下载链接
链接失效反馈
官方服务:
资源简介:
High-throughput sequencing (HTS) of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken from heterogeneous populations of heterozygous individuals. This requires that a number of issues encountered with GBS be considered, including the sequencing of nonoverlapping sets of loci across multiple GBS libraries, a common missing data problem that results in low call rates for markers per individual, and a tendency for applicability only in inbred line samples with sufficient linkage disequilibrium for accurate imputation. We addressed these issues while developing and validating a new, comprehensive platform for GBS. This study supports the notion that GBS can be tailored to particular aims, and using Zea mays our results indicate that large samples of unknown pedigree can be genotyped to obtain complete and accurate GBS data. Optimizing size selection to sequence a high proportion of shared loci among individuals in different libraries and using simple in silico filters, a GBS procedure was established that produces high call rates per marker (>85%) with accuracy exceeding 99.4%. Furthermore, by capitalizing on the sequence-read structure of GBS data (stacks of reads), a new tool for resolving local haplotypes and scoring phased genotypes was developed, a feature that is not available in many GBS pipelines. Using local haplotypes reduces the marker dimensionality of the genotype matrix while increasing the informativeness of the data. Phased GBS in maize also revealed the existence of reproducibly inaccurate (apparent accuracy) genotypes that were due to divergent copy number variants (CNVs) unobservable in the underlying single nucleotide polymorphism (SNP) data. Resources in this dataset:Resource Title: Supplementary Data. File Name: Web Page, url: https://academic.oup.com/g3journal/article/7/7/2161/6053605#supplementary-data

高吞吐量测序(HTS)在降低代表性基因组库中的应用,标志着通过测序进行基因分型(GBS)时代的来临,使得几乎任何物种的全基因组基因型数据得以获取。然而,对于从异质人群中大量采集的杂合子个体的基因分型,仍需开发无需插补的GBS方法。这要求在开发并验证新的、全面的GBS平台时,充分考虑GBS中遇到的一系列问题,包括多个GBS库中不同位点的非重叠测序、常见的数据缺失问题导致个体标记调用率低下,以及仅适用于具有足够连锁不平衡的近交系样本的适用性。本研究解决了这些问题,并提出了一个全新的、全面的GBS平台。该研究支持了GBS可根据特定目标进行定制化的观点,利用玉米(Zea mays)作为研究对象,我们的结果表明,未知家系的较大样本可以经过基因分型,以获得完整且准确的GBS数据。通过优化尺寸选择,以测序不同库中个体共享位点的较高比例,并利用简单的虚拟筛选,建立了一种GBS程序,该程序每个标记的调用率高达85%以上,且准确率超过99.4%。此外,通过利用GBS数据的序列读结构(读堆),开发了一种新的工具,用于解析局部单倍型和评分相分基因型,这一功能在许多GBS流程中尚不存在。使用局部单倍型可以降低基因型矩阵的标记维度,同时提高数据的信息量。玉米中的相分GBS还揭示了由于基础单核苷酸多态性(SNP)数据中不可观察到的拷贝数变异(CNVs)而导致的可重复性不准确(表观准确性)的基因型存在。本数据集资源包括:资源标题:补充数据。文件名:网页,网址:https://academic.oup.com/g3journal/article/7/7/2161/6053605#supplementary-data
提供机构:
agdatacommons.nal.usda.gov
二维码
社区交流群
二维码
科研交流群
商业服务