five

Exome-wide benchmark of difficult-to-sequence regions using short-read next-generation DNA sequencing

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/DRP010834
下载链接
链接失效反馈
官方服务:
资源简介:
Next-generation DNA sequencing (NGS) in a short-read mode has been recently used for genetic testing in various clinical settings. NGS data accuracy is crucial in clinical settings and several reports regarding quality control of NGS data, focusing mostly on establishing NGS sequence read accuracy, have been published thus far. Variant calling is another critical source of NGS errors that remains mostly unexplored despite its established significance. In this study, we used a machine-learning based method to establish an exome-wide benchmark of difficult-to-sequence regions using 10 genome sequence features on the basis of real-world NGS data accumulated in The Genome Aggregation Database (gnomAD) of the human reference genome sequence (GRCh38/hg38). We used the obtained metrics, designated 'UNMET score,' along with other lines of structural information of the human genome to identify difficult-to-sequence genomic regions using conventional NGS. Thus, the UNMET score could provide appropriate caveats to address potential sequential errors in protein-coding exons of the human reference genome sequence of GRCh38/hg38 in clinical sequencing.
创建时间:
2023-12-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作