DataSheet1_KNNCNV: A K-Nearest Neighbor Based Method for Detection of Copy Number Variations Using NGS Data.PDF

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://figshare.com/articles/dataset/DataSheet1_KNNCNV_A_K-Nearest_Neighbor_Based_Method_for_Detection_of_Copy_Number_Variations_Using_NGS_Data_PDF/17361899

下载链接

链接失效反馈

官方服务：

资源简介：

Copy number variation (CNV) is a well-known type of genomic mutation that is associated with the development of human cancer diseases. Detection of CNVs from the human genome is a crucial step for the pipeline of starting from mutation analysis to cancer disease diagnosis and treatment. Next-generation sequencing (NGS) data provides an unprecedented opportunity for CNVs detection at the base-level resolution, and currently, many methods have been developed for CNVs detection using NGS data. However, due to the intrinsic complexity of CNVs structures and NGS data itself, accurate detection of CNVs still faces many challenges. In this paper, we present an alternative method, called KNNCNV (K-Nearest Neighbor based CNV detection), for the detection of CNVs using NGS data. Compared to current methods, KNNCNV has several distinctive features: 1) it assigns an outlier score to each genome segment based solely on its first k nearest-neighbor distances, which is not only easy to extend to other data types but also improves the power of discovering CNVs, especially the local CNVs that are likely to be masked by their surrounding regions; 2) it employs the variational Bayesian Gaussian mixture model (VBGMM) to transform these scores into a series of binary labels without a user-defined threshold. To evaluate the performance of KNNCNV, we conduct both simulation and real sequencing data experiments and make comparisons with peer methods. The experimental results show that KNNCNV could derive better performance than others in terms of F1-score.

创建时间：

2021-12-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集