Data from: Whole genome sequence accuracy is improved by replication in a population of mutagenized sorghum.
收藏DataCite Commons2025-06-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.t80gj
下载链接
链接失效反馈官方服务:
资源简介:
The accurate detection of induced mutations is critical for both forward
and reverse genetics studies. Experimental chemical mutagenesis induces
relatively few single base changes per individual. In a complex eukaryotic
genome, false positive detection of mutations can occur at or above this
mutagenesis rate. We demonstrate here, using a population of ethyl
methanesulfonate (EMS) treated Sorghum bicolor BTx623 individuals, that
using replication to detect false positive induced variants in
next-generation sequencing data permits higher throughput variant
detection with greater accuracy. We used a lower sequence coverage depth
(average of 7X) from 586 independently mutagenized individuals and
detected 5,399,493 homozygous SNPs. Of these, 76% originated from only
57,872 genomic positions prone to false positive variant calling. These
positions are characterized by high copy number paralogs where the
error-prone SNP positions are at copies containing a variant at the SNP
position. The ability of short stretches of homology to generate these
error prone positions suggests that incompletely assembled or poorly
mapped repeated sequences are one driver of these error prone positions..
Removal of these false positives left 1,275,872 homozygous and 477,531
heterozygous EMS-induced SNPs which, congruent with the mutagenic
mechanism of EMS, were greater than 98% G:C to A:T transitions. Through
this analysis we generated a database of sequence indexed mutants of
Sorghum. This collection contains 4,035 high impact homozygous mutations
in 3,637 genes and 56,514 homozygous missense mutations in 23,227 genes.
Each line contains, on average, 2,177 annotated homozygous SNPs per
genome, including seven likely gene knockouts and 96 missense mutations.
The number of mutations in a transcript was linearly correlated with the
transcript length and also the G+C count, but not with the GC/AT ratio.
Analysis of the detected mutagenized positions identified CG-rich patches,
and flanking sequences strongly influenced EMS-induced mutation rates. Our
method for detecting false-positive induced mutations is generally
applicable to any organism, is independent of the choice of in silico
variant-calling algorithm, and is most valuable when the true mutation
rate is likely to be low, such as in laboratory induced mutations or
somatic mutation detection in medicine.
提供机构:
Dryad
创建时间:
2018-01-24



