five

H5N1 Wastewater Detection Demo Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14638948
下载链接
链接失效反馈
官方服务:
资源简介:
# H5N1 Wastewater Detection Demo Dataset 7 8## Overview 9This dataset combines simulated H5N1 influenza reads with real wastewater metagenome data to create a benchmark for viral detection methods. It simulates a scenario where a novel H5N1 strain is present in urban wastewater at detectable levels. 10 11## Dataset Composition 12- Total reads: 707,830 13 - H5N1 reads: 1,120 (0.16%) 14 - Wastewater reads: 706,710 (99.84%) 15 16### Viral Content Breakdown Total Viruses (0.34% of all reads):├── H5N1 (0.16%)├── Caudovirales (0.18%)│ ├── Siphoviridae (0.12%)│ └── Podoviridae (0.06%)├── Microviridae (0.08%)└── Other viruses (0.02%) Collapse 1 2## Data Sources 3 4### H5N1 Component 5- Source: Influenza A virus (A/chicken/Egypt/N19604C/2021(H9N2)) 6- NCBI Accessions: 7 - PB2: ON374267.1 8 - PB1: ON374268.1 9 - PA: ON374269.1 10 - HA: ON374270.1 (Modified with mutations) 11 - NP: ON374271.1 12 - NA: ON374272.1 13 - M: ON374273.1 14 - NS: ON374266.1 15 16#### Modifications 17- Mutation rate: 0.1% (introduced using wgsim) 18- Error rate: 0.1% 19- Coverage: 10x 20- Read length: 150bp 21- Sequencing profile: HiSeq 2500 22 23### Wastewater Component 24- Source: Global Urban Virome Project 25- Accession: ERR2734409 26- Original composition preserved 27- Represents typical urban wastewater viral diversity 28 29## Directory Structure combined_data/├── input/│ ├── fasta/│ │ ├── h5n1.fasta│ │ └── wastewater.fasta│ ├── h5n1/│ │ ├── final_reads.fq│ │ └── h5n1_complete.fasta│ ├── uncompressed/│ │ └── ERR2734409.fastq│ └── wastewater/│ └── ERR2734409.fastq.gz├── scripts/│ ├── combine_segments.py│ ├── convert_fastq.py│ └── workflow.sh└── README.md   1 2## Dataset Creation Method 3 4### 1. H5N1 Sequence Preparation 5```bash 6# Introduce mutations in HA segment 7wgsim -N 100000 -e 0.001 -r 0.001 -R 0.0 ON374266.1.fasta HA_mutated_1.fq 8 9# Convert mutated sequence to FASTA 10python3 convert_fastq.py 11 12# Combine all segments 13python3 combine_segments.py 2. Read Simulation BASH   1# Generate Illumina reads using ART 2art_illumina -i h5n1_complete.fasta -l 150 -ss HS25 -f 10 -sam -nf 0 -o final_reads 3. Dataset Combination BASH   1# Combine H5N1 and wastewater reads 2cat final_reads.fq ERR2734409.fastq > combined_reads.fastq Validation Results Python   1Dataset Statistics: 2================== 3Total reads: 707,830 4H5N1: 1,120 reads (0.16%) 5Wastewater: 706,710 reads (99.84%) 6 7Read Lengths: 8Mean: 150.0 9Std dev: 0.0 10 11GC Content: 12Mean: 47.2% 13Std dev: 8.4% Citations Nieuwenhuijse, D.F., Oude Munnink, B.B., Phan, M.V.T. et al. Setting a baseline for global urban virome surveillance in sewage. Sci Rep 10, 13748 (2020). https://doi.org/10.1038/s41598-020-69869-0 Li H. wgsim - Read simulator for next generation sequencing. (2011). Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics. 2012 Feb 15;28(4):593-4.
创建时间:
2025-01-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作