Simulated reads for benchmarking SARS-CoV-2 lineage abundance estimation

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/8298886

下载链接

链接失效反馈

官方服务：

资源简介：

To evaluate the accuracy of lineage abundance estimates from amplicon-based and whole genome-based sequencing, we simulated paired-end reads from amplicons determined by AmpliDiff, and reads spanning full genomes. Abundances of lineages are based on the relative abundance of a lineage within the dataset. The data consists of the following 8 independent datasets: 200 bp reads from the Netherlands based on AmpliDiff amplicons (1, 2, 5 or 10 amplicons) at 1000x coverage, 400 bp reads from the Netherlands based on AmpliDiff amplicons (1, 2, 5 or 10 amplicons) at 1000x coverage, 200 bp reads from the Netherlands based on whole genome sequencing at 100x coverage, 400 bp reads from the Netherlands based on whole genome sequencing at 100x coverage, 200 bp reads from Texas based on AmpliDiff amplicons (1, 2, 5 or 10 amplicons) at 1000x coverage, 400 bp reads from Texas based on AmpliDiff amplicons (1, 2, 5 or 10 amplicons) at 1000x coverage, 200 bp reads from Texas based on whole genome sequencing at 100x coverage, 400 bp reads from Texas based on whole genome sequencing at 100x coverage. Every independent dataset contains 20 sets of reads (generated with different random seeds). The genomes used for the Netherlands-based simulations can be obtained via GISAID through accession id EPI_SET_230825fe, and the genomes used for the Texas-based simulations can be obtained via GISAID through accession id EPI_SET_230825pe.

创建时间：

2023-08-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集