five

PRJEB44932 SARS-CoV-2 Wastewater Sequences Processed

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/9h5cf9wkrk
下载链接
链接失效反馈
官方服务:
资源简介:
The data come from NCBI BioProject PRJEB44932, as analyzed in Jahn et. al (2022). These data are short read sequences from wastewater in 6 locations in a tourist-heavy area of Switzerland from 2021-02-01 to 2021-11-30, with daily sampling for most sampling sites. There are 303 unique dates in these data, but only 227 of these dates were common to all WasteWater Treatment Plants (WWTPs). For our analysis, we are treating each date as if they are one "unit" away, even though there are many cases of 2 and 3 days between time points (up to a maximum of 8 days). See Jahn et. al (2022) for a detailed description of the genomic sequencing process. Data processing involved alignment of the short reads to the Wuhan-1 reference sequence (NC_045512) with `minimap2` v2.28, identifying the mutations relative to the reference, and recording the number of times a mutation was observed (counts) and the depth of coverage. The frequency is calculated as the counts divided by the coverage. The mutation pre-processing pipeline is available at \url{https://github.com/DASL-Lab/data-treatment-plant}, and heavily relies on the GromStole pipeline (\url{https://github.com/PoonLab/gromstole}). After being processed into counts and coverage, the data were filtered to only include mutations that are relevant to analysis. There were many mutations with either consistently low counts (possibly due to sequencing errors) or low coverage. After filtering the data for sampling dates that were common to all locations, there are 1,061 unique mutations at each location on each sampling date. We found all mutations that had both a frequency of at least 0.1 and a frequency below 0.9 (with a coverage at least 40) at at least two time points during the study in any location. This ensures that we have all of the mutations that were potentially part of a circulating lineage without relying on lineage definitions. Jahn, Katharina, David Dreifuss, Ivan Topolsky, Anina Kull, Pravin Ganesanandamoorthy, Xavier Fernandez-Cassi, Carola Bänziger, et al. 2022. “Early Detection and Surveillance of SARS- CoV-2 Genomic Variants in Wastewater Using COJAC.” Nature Microbiology 7 (8): 1151–60. https://doi.org/10.1038/s41564-022-01185-x.
创建时间:
2025-01-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作