five

Dataset for Multiplex-PCR detection and Nanopore-based genotyping of fish pathogens

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7860848
下载链接
链接失效反馈
官方服务:
资源简介:
This zip file contains scripts, initial fastq files, assembled genomes (public and from this study) as well as bioinformatics intermediate files used for this study. File Structure and Descriptions ├── 01_process.sh : primer trimming, length-based filtering, read alignment, alignment filtering (unique hit) and extraction of uniquely hit reads for consensus generation ├── 02_consensus.sh : [need artic conda env] Generation of consensus based on uniquely-mapped reads and minimal read depth of 20x required to call a variant (or it will be masked) ├── 03_cleanup.sh: General folder and intermediate file re-organization ├── 04_filter.sh: [need quast conda env] statistic of consensus generated and filtering of consensus with one or more ambiguous base (N), not suitable for haplotype ├── 05_cluster.sh: clustering of consensus based on 100% identity threshold to generate putative haplotype ├── Amplicon_FastQ folder: uniquelly mapped fastq files for consensus generation ├── BAM: alignment files generated from minimap2 used as input for the artic pipeline to identify variants ├── Cluster_Rep.txt: Consensus sequences that were chosen to represent each haplotype ├── Consensus folder: consensus fasta files generated for each sample containing sequences for each specific pathogen ├── Coverage folder: coverage and base-level read depth for each sample and each pathogen reference genes ├── Filter: individiual fasta sequences (only 1 sequence per file) for each pathogen and each sample without any ambiguous base for subsequent clustering analysis ├── Full_Haplotype.fasta: all possible haplotype sequences generated for each pathogen ├── Gap_Analysis.tsv: Table with percentage of gap (0-100%) for each consensus sequence generated (used for filtering) ├── Haplotype folder: Intermediate file and sample-level haplotype used to infer final haplotype and generate haplotype summary ├── Haplotype_summary.tsv: Table with sample ID and their respectively pathogen haplotype ├── Minimap2_PAF: Intermediate alignment generated from minimap2 used to generate the count table ├── Original: fastq with original naming prior to renaming based on sampleID. a script (rename.sh) was included to show renaming scheme ├── primer.fasta: Primer sequences used for identifying and trimming reads with flanking primer sequence ├── primer.fasta.fai: the index file for primer.fasta ├── PrimerTrim folder: Primer-trimmed reads ├── quast_results: consensus statistics generated by quast ├── RawCount.tsv: Count table generated that can used as a input to generate figure ├── RawFastq folder: Raw reads that have been renamed to reflect sample information ├── readme.md: The current readme file ├── ref_full_latest.fasta: Reference sequence of (gene segments) 4 pathogens e.g. TilV, ISKNV, SAG (S├── 01_process.sh : primer trimming, length-based filtering, read alignment, alignment filtering (unique hit) and extraction of uniquely hit reads for consensus generation ├── 02_consensus.sh : [need artic conda env] Generation of consensus based on uniquely-mapped reads and minimal read depth of 20x required to call a variant (or it will be masked) ├── 03_cleanup.sh: General folder and intermediate file re-organization ├── 04_filter.sh: [need quast conda env] statistic of consensus generated and filtering of consensus with one or more ambiguous base (N), not suitable for haplotype ├── 05_cluster.sh: clustering of consensus based on 100% identity threshold to generate putative haplotype ├── Amplicon_FastQ folder: uniquelly mapped fastq files for consensus generation ├── BAM: alignment files generated from minimap2 used as input for the artic pipeline to identify variants ├── Cluster_Rep.txt: Consensus sequences that were chosen to represent each haplotype ├── Consensus folder: consensus fasta files generated for each sample containing sequences for each specific pathogen ├── Coverage folder: coverage and base-level read depth for each sample and each pathogen reference genes ├── Filter: individiual fasta sequences (only 1 sequence per file) for each pathogen and each sample without any ambiguous base for subsequent clustering analysis  ├── Full_Haplotype.fasta: all possible haplotype sequences generated for each pathogen  ├── Gap_Analysis.tsv: Table with percentage of gap (0-100%) for each consensus sequence generated (used for filtering)  ├── Haplotype folder: Intermediate file and sample-level haplotype used to infer final haplotype and generate haplotype summary ├── Haplotype_summary.tsv: Table with sample ID and their respectively pathogen haplotype ├── Minimap2_PAF: Intermediate alignment generated from minimap2 used to generate the count table ├── Original: fastq with original naming prior to renaming based on sampleID. a script (rename.sh) was included to show renaming scheme ├── primer.fasta: Primer sequences used for identifying and trimming reads with flanking primer sequence ├── primer.fasta.fai: the index file for primer.fasta ├── PrimerTrim folder: Primer-trimmed reads ├── quast_results: consensus statistics generated by quast ├── RawCount.tsv: Count table generated that can used as a input to generate figure ├── RawFastq folder: Raw reads that have been renamed to reflect sample information ├── readme.md: The current readme file ├── ref_full_latest.fasta: Reference sequence of (gene segments) 4 pathogens e.g. TilV, ISKNV, SAG (Streptococcus agalactiae), FNO (Francisella noatunensis subsp. orientalis)  ├── ref_full_latest.primer.fasta: Same as above but with their primer binding sequence trimmed similar to the processed reads ├── ref_full_latest.primer.fasta.fai ├── RenameHaplotype: Script to perform reorganization of cdhit output ├── Seq.stat.tsv: Sequencing statistics └── VCF: VCF files from medaka variant calling used to generate the final consensus treptococcus agalactiae), FNO (Francisella noatunensis subsp. orientalis) ├── ref_full_latest.primer.fasta: Same as above but with their primer binding sequence trimmed similar to the processed reads ├── ref_full_latest.primer.fasta.fai ├── RenameHaplotype: Script to perform reorganization of cdhit output ├── Seq.stat.tsv: Sequencing statistics └── VCF: VCF files from medaka variant calling used to generate the final consensus
创建时间:
2023-05-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作