Dataset for Multiplex-PCR detection and Nanopore-based genotyping of fish pathogens

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/7860848

下载链接

链接失效反馈

官方服务：

资源简介：

This zip file contains scripts, initial fastq files, assembled genomes (public and from this study) as well as bioinformatics intermediate files used for this study. File Structure and Descriptions ├── 01_process.sh : primer trimming, length-based filtering, read alignment, alignment filtering (unique hit) and extraction of uniquely hit reads for consensus generation ├── 02_consensus.sh : [need artic conda env] Generation of consensus based on uniquely-mapped reads and minimal read depth of 20x required to call a variant (or it will be masked) ├── 03_cleanup.sh: General folder and intermediate file re-organization ├── 04_filter.sh: [need quast conda env] statistic of consensus generated and filtering of consensus with one or more ambiguous base (N), not suitable for haplotype ├── 05_cluster.sh: clustering of consensus based on 100% identity threshold to generate putative haplotype ├── Amplicon_FastQ folder: uniquelly mapped fastq files for consensus generation ├── BAM: alignment files generated from minimap2 used as input for the artic pipeline to identify variants ├── Cluster_Rep.txt: Consensus sequences that were chosen to represent each haplotype ├── Consensus folder: consensus fasta files generated for each sample containing sequences for each specific pathogen ├── Coverage folder: coverage and base-level read depth for each sample and each pathogen reference genes ├── Filter: individiual fasta sequences (only 1 sequence per file) for each pathogen and each sample without any ambiguous base for subsequent clustering analysis ├── Full_Haplotype.fasta: all possible haplotype sequences generated for each pathogen ├── Gap_Analysis.tsv: Table with percentage of gap (0-100%) for each consensus sequence generated (used for filtering) ├── Haplotype folder: Intermediate file and sample-level haplotype used to infer final haplotype and generate haplotype summary ├── Haplotype_summary.tsv: Table with sample ID and their respectively pathogen haplotype ├── Minimap2_PAF: Intermediate alignment generated from minimap2 used to generate the count table ├── Original: fastq with original naming prior to renaming based on sampleID. a script (rename.sh) was included to show renaming scheme ├── primer.fasta: Primer sequences used for identifying and trimming reads with flanking primer sequence ├── primer.fasta.fai: the index file for primer.fasta ├── PrimerTrim folder: Primer-trimmed reads ├── quast_results: consensus statistics generated by quast ├── RawCount.tsv: Count table generated that can used as a input to generate figure ├── RawFastq folder: Raw reads that have been renamed to reflect sample information ├── readme.md: The current readme file ├── ref_full_latest.fasta: Reference sequence of (gene segments) 4 pathogens e.g. TilV, ISKNV, SAG (S├── 01_process.sh : primer trimming, length-based filtering, read alignment, alignment filtering (unique hit) and extraction of uniquely hit reads for consensus generation ├── 02_consensus.sh : [need artic conda env] Generation of consensus based on uniquely-mapped reads and minimal read depth of 20x required to call a variant (or it will be masked) ├── 03_cleanup.sh: General folder and intermediate file re-organization ├── 04_filter.sh: [need quast conda env] statistic of consensus generated and filtering of consensus with one or more ambiguous base (N), not suitable for haplotype ├── 05_cluster.sh: clustering of consensus based on 100% identity threshold to generate putative haplotype ├── Amplicon_FastQ folder: uniquelly mapped fastq files for consensus generation ├── BAM: alignment files generated from minimap2 used as input for the artic pipeline to identify variants ├── Cluster_Rep.txt: Consensus sequences that were chosen to represent each haplotype ├── Consensus folder: consensus fasta files generated for each sample containing sequences for each specific pathogen ├── Coverage folder: coverage and base-level read depth for each sample and each pathogen reference genes ├── Filter: individiual fasta sequences (only 1 sequence per file) for each pathogen and each sample without any ambiguous base for subsequent clustering analysis ├── Full_Haplotype.fasta: all possible haplotype sequences generated for each pathogen ├── Gap_Analysis.tsv: Table with percentage of gap (0-100%) for each consensus sequence generated (used for filtering) ├── Haplotype folder: Intermediate file and sample-level haplotype used to infer final haplotype and generate haplotype summary ├── Haplotype_summary.tsv: Table with sample ID and their respectively pathogen haplotype ├── Minimap2_PAF: Intermediate alignment generated from minimap2 used to generate the count table ├── Original: fastq with original naming prior to renaming based on sampleID. a script (rename.sh) was included to show renaming scheme ├── primer.fasta: Primer sequences used for identifying and trimming reads with flanking primer sequence ├── primer.fasta.fai: the index file for primer.fasta ├── PrimerTrim folder: Primer-trimmed reads ├── quast_results: consensus statistics generated by quast ├── RawCount.tsv: Count table generated that can used as a input to generate figure ├── RawFastq folder: Raw reads that have been renamed to reflect sample information ├── readme.md: The current readme file ├── ref_full_latest.fasta: Reference sequence of (gene segments) 4 pathogens e.g. TilV, ISKNV, SAG (Streptococcus agalactiae), FNO (Francisella noatunensis subsp. orientalis) ├── ref_full_latest.primer.fasta: Same as above but with their primer binding sequence trimmed similar to the processed reads ├── ref_full_latest.primer.fasta.fai ├── RenameHaplotype: Script to perform reorganization of cdhit output ├── Seq.stat.tsv: Sequencing statistics └── VCF: VCF files from medaka variant calling used to generate the final consensus treptococcus agalactiae), FNO (Francisella noatunensis subsp. orientalis) ├── ref_full_latest.primer.fasta: Same as above but with their primer binding sequence trimmed similar to the processed reads ├── ref_full_latest.primer.fasta.fai ├── RenameHaplotype: Script to perform reorganization of cdhit output ├── Seq.stat.tsv: Sequencing statistics └── VCF: VCF files from medaka variant calling used to generate the final consensus

创建时间：

2023-05-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集