Exome-wide benchmark of difficult-to-sequence regions using short-read next-generation DNA sequencing

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/DRP010834

下载链接

链接失效反馈

官方服务：

资源简介：

Next-generation DNA sequencing (NGS) in a short-read mode has been recently used for genetic testing in various clinical settings. NGS data accuracy is crucial in clinical settings and several reports regarding quality control of NGS data, focusing mostly on establishing NGS sequence read accuracy, have been published thus far. Variant calling is another critical source of NGS errors that remains mostly unexplored despite its established significance. In this study, we used a machine-learning based method to establish an exome-wide benchmark of difficult-to-sequence regions using 10 genome sequence features on the basis of real-world NGS data accumulated in The Genome Aggregation Database (gnomAD) of the human reference genome sequence (GRCh38/hg38). We used the obtained metrics, designated 'UNMET score,' along with other lines of structural information of the human genome to identify difficult-to-sequence genomic regions using conventional NGS. Thus, the UNMET score could provide appropriate caveats to address potential sequential errors in protein-coding exons of the human reference genome sequence of GRCh38/hg38 in clinical sequencing.

创建时间：

2023-12-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集