Performance comparison of multiple tools and CONGA performance analysis.

NIAID Data Ecosystem2026-03-14 收录

下载链接：

https://figshare.com/articles/dataset/Performance_comparison_of_multiple_tools_and_CONGA_performance_analysis_/21727822

下载链接

链接失效反馈

官方服务：

资源简介：

Table A shows the CNV predictions of CONGA, GenomeSTRiP, FREEC, CNVnator and mrCaNaVaR on simulated genomes at depths 0.05×, 0.1×, 0.5×, 1× and 5× for deletions and duplications of multiple CNV size intervals including 100 bps–1 kbps (small), 1 kbps–10 kbps (medium) and 10 kbps–100 kbps (large). Here, “T” and “F” refer to correct and incorrect predictions respectively, “Miss” is the number of missed true events, “Recall” (TPR) is the true positive rate, and “FDR” is false discovery rate (1—“Precision”) for each run. The F-Score is calculated as (2 * Precision * Recall) / (Precision + Recall). Note that for CONGA, we included the performance for both C-score<0.5 and C-score<0.3. The table also contains a comparison between CONGA and GenomeSTRiP predictions. Table B shows a comparison between CONGA and GenomeSTRiP predictions on simulated genomes at depths 0.05×, 0.1×, 0.5×, 1× and 5× for deletions and duplications of 1 kbps–10 kbps (medium) and 10 kbps–100 kbps (large) CNV size intervals. Table C shows the copy-number (homozygous or heterozygous) predictions of CONGA on simulated genomes at depths 0.05×, 0.1×, 0.5×, 1× and 5× for deletions and duplications of multiple CNV size intervals including 100 bps–1 kbps (small), 1 kbps–10 kbps (medium) and 10 kbps–100 kbps (large). Here, “T” and “F” refer to correct and incorrect predictions respectively, “Miss” is the number of missed true events, “Recall” (TPR) is the true positive rate, and “FDR” is false discovery rate (1—“Precision”) for each run. The F-Score is calculated as (2 * Precision * Recall) / (Precision + Recall). Note that for CONGA, we included the performance for both C-score<0.5 and C-score<0.3. Table D shows deletion and duplication predictions of CONGA using Mota, Saqqaq and Yamnaya genomes down-sampled to various depths from their original coverages of 9.6×, 13.1× and 23.3×, respectively. Here, “T” and “F” refer to correct and incorrect predictions respectively, “Miss” is the number of missed true events, “Recall” (TPR) is the true positive rate, and “FDR” is false discovery rate (1—“Precision”) for each run. The F-Score is calculated as (2 * Precision * Recall) / (Precision + Recall). We calculated “True”, “False”, “Miss”, “Recall”, “Precision”, FDR and F-Score of down-sampled genomes assuming that our CONGA-based predictions with the original genomes (full data) reflect the ground truth. These predictions, in turn, were made using modern-day CNVs as candidate CNV list. The purpose of the experiment was to evaluate accuracy at lower coverage relative to the full data. Table E shows CONGA’s running time and memory consumption on genomes of various depths of coverage calculated using the down-sampled 23× Yamnaya genome (with coverages between 23× and 0.07×) as well as a comparison of CONGA, GenomeSTRiP, FREEC and CNVnator using a 5× simulated genome. Table F shows the results of CONGA runs on simulated genomes at a range of parameters, performed in order to determine the optimum parameters to be used. We tested multiple parameter combinations of C-Score, minimum read-pair support, mappability and minimum mapping quality (MAPQ) using simulated genomes. (XLSX)

创建时间：

2022-12-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集