Dataset for Multiplex-PCR detection and Nanopore-based genotyping of fish pathogens
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7860848
下载链接
链接失效反馈官方服务:
资源简介:
This zip file contains scripts, initial fastq files, assembled genomes (public and from this study) as well as bioinformatics intermediate files used for this study.
File Structure and Descriptions
├── 01_process.sh : primer trimming, length-based filtering, read alignment, alignment filtering (unique hit) and extraction of uniquely hit reads for consensus generation
├── 02_consensus.sh : [need artic conda env] Generation of consensus based on uniquely-mapped reads and minimal read depth of 20x required to call a variant (or it will be masked)
├── 03_cleanup.sh: General folder and intermediate file re-organization
├── 04_filter.sh: [need quast conda env] statistic of consensus generated and filtering of consensus with one or more ambiguous base (N), not suitable for haplotype
├── 05_cluster.sh: clustering of consensus based on 100% identity threshold to generate putative haplotype
├── Amplicon_FastQ folder: uniquelly mapped fastq files for consensus generation
├── BAM: alignment files generated from minimap2 used as input for the artic pipeline to identify variants
├── Cluster_Rep.txt: Consensus sequences that were chosen to represent each haplotype
├── Consensus folder: consensus fasta files generated for each sample containing sequences for each specific pathogen
├── Coverage folder: coverage and base-level read depth for each sample and each pathogen reference genes
├── Filter: individiual fasta sequences (only 1 sequence per file) for each pathogen and each sample without any ambiguous base for subsequent clustering analysis
├── Full_Haplotype.fasta: all possible haplotype sequences generated for each pathogen
├── Gap_Analysis.tsv: Table with percentage of gap (0-100%) for each consensus sequence generated (used for filtering)
├── Haplotype folder: Intermediate file and sample-level haplotype used to infer final haplotype and generate haplotype summary
├── Haplotype_summary.tsv: Table with sample ID and their respectively pathogen haplotype
├── Minimap2_PAF: Intermediate alignment generated from minimap2 used to generate the count table
├── Original: fastq with original naming prior to renaming based on sampleID. a script (rename.sh) was included to show renaming scheme
├── primer.fasta: Primer sequences used for identifying and trimming reads with flanking primer sequence
├── primer.fasta.fai: the index file for primer.fasta
├── PrimerTrim folder: Primer-trimmed reads
├── quast_results: consensus statistics generated by quast
├── RawCount.tsv: Count table generated that can used as a input to generate figure
├── RawFastq folder: Raw reads that have been renamed to reflect sample information
├── readme.md: The current readme file
├── ref_full_latest.fasta: Reference sequence of (gene segments) 4 pathogens e.g. TilV, ISKNV, SAG (S├── 01_process.sh : primer trimming, length-based filtering, read alignment, alignment filtering (unique hit) and extraction of uniquely hit reads for consensus generation
├── 02_consensus.sh : [need artic conda env] Generation of consensus based on uniquely-mapped reads and minimal read depth of 20x required to call a variant (or it will be masked)
├── 03_cleanup.sh: General folder and intermediate file re-organization
├── 04_filter.sh: [need quast conda env] statistic of consensus generated and filtering of consensus with one or more ambiguous base (N), not suitable for haplotype
├── 05_cluster.sh: clustering of consensus based on 100% identity threshold to generate putative haplotype
├── Amplicon_FastQ folder: uniquelly mapped fastq files for consensus generation
├── BAM: alignment files generated from minimap2 used as input for the artic pipeline to identify variants
├── Cluster_Rep.txt: Consensus sequences that were chosen to represent each haplotype
├── Consensus folder: consensus fasta files generated for each sample containing sequences for each specific pathogen
├── Coverage folder: coverage and base-level read depth for each sample and each pathogen reference genes
├── Filter: individiual fasta sequences (only 1 sequence per file) for each pathogen and each sample without any ambiguous base for subsequent clustering analysis
├── Full_Haplotype.fasta: all possible haplotype sequences generated for each pathogen
├── Gap_Analysis.tsv: Table with percentage of gap (0-100%) for each consensus sequence generated (used for filtering)
├── Haplotype folder: Intermediate file and sample-level haplotype used to infer final haplotype and generate haplotype summary
├── Haplotype_summary.tsv: Table with sample ID and their respectively pathogen haplotype
├── Minimap2_PAF: Intermediate alignment generated from minimap2 used to generate the count table
├── Original: fastq with original naming prior to renaming based on sampleID. a script (rename.sh) was included to show renaming scheme
├── primer.fasta: Primer sequences used for identifying and trimming reads with flanking primer sequence
├── primer.fasta.fai: the index file for primer.fasta
├── PrimerTrim folder: Primer-trimmed reads
├── quast_results: consensus statistics generated by quast
├── RawCount.tsv: Count table generated that can used as a input to generate figure
├── RawFastq folder: Raw reads that have been renamed to reflect sample information
├── readme.md: The current readme file
├── ref_full_latest.fasta: Reference sequence of (gene segments) 4 pathogens e.g. TilV, ISKNV, SAG (Streptococcus agalactiae), FNO (Francisella noatunensis subsp. orientalis)
├── ref_full_latest.primer.fasta: Same as above but with their primer binding sequence trimmed similar to the processed reads
├── ref_full_latest.primer.fasta.fai
├── RenameHaplotype: Script to perform reorganization of cdhit output
├── Seq.stat.tsv: Sequencing statistics
└── VCF: VCF files from medaka variant calling used to generate the final consensus
treptococcus agalactiae), FNO (Francisella noatunensis subsp. orientalis)
├── ref_full_latest.primer.fasta: Same as above but with their primer binding sequence trimmed similar to the processed reads
├── ref_full_latest.primer.fasta.fai
├── RenameHaplotype: Script to perform reorganization of cdhit output
├── Seq.stat.tsv: Sequencing statistics
└── VCF: VCF files from medaka variant calling used to generate the final consensus
创建时间:
2023-05-10



