five

Supporting data for "NPSV: A simulation-driven approach to genotyping structural variants in whole genome sequencing data"

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100908
下载链接
链接失效反馈
官方服务:
资源简介:
Structural variants (SV) play a causal role in numerous diseases but are difficult to detect and accurately genotype (determine zygosity) in whole genome next-generation sequencing (NGS) data. SV genotypers that assume the aligned sequencing data uniformly reflects the underlying SV or use existing SV call sets as training data can only partially account for variant and sample-specific biases. <br>We introduce NPSV, a machine learning-based approach for genotyping previously discovered SVs that employs NGS simulation to model the combined effects of the genomic region, sequencer and alignment pipeline on the observed SV evidence. We evaluate NPSV alongside existing SV genotypers on multiple benchmark call sets. We show that NPSV consistently achieves or exceeds state-of-the-art genotyping accuracy across SV call sets, samples and variant types. NPSV can specifically identify putative <i>de novo</i> SVs in a trio context and is robust to offset SV breakpoints. <br>Growing SV databases and the increasing availability of SV calls from long-read sequencing make stand-alone genotyping of previously identified SVs an increasingly important component of genome analyses. By treating potential biases as a simulate-able black box NPSV provides a framework for accurately genotyping a broad range of SVs in both targeted and genome-scale applications.
提供机构:
GigaScience Database
创建时间:
2021-06-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作