five

Nemabiome ITS Database

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Nemabiome_ITS_Database/28013753
下载链接
链接失效反馈
官方服务:
资源简介:
18S, ITS1 and ITS2, 28S Full Nematode Database: Building a NanoCLUST Db for Parasitic Nematodes using 18S rRNA, 28S rRNA, ITS1, 5.8S and ITS2. Nematoda Taxonomy ID: 6231 (hence must use txid6231). Key words: 18S ribosomal RNA 18S rRNA 18S 28S ribosomal RNA 28S rRNA 28S 5.8S ribosomal RNA 5.8S rRNA 5.8S Ribosomal RNA SSU rRNA LSU rRNA SSU ribosomal RNA LSU ribosomal RNA Internal transcribed spacer Internal transcribed spacer 1 Internal transcribed spacer 2 ITS ITS1 ITS2 Final NCBI GenBank search term: (((((((((((((((((((((((((18S ribosomal RNA[Title]) OR 18S rRNA[Title]) OR 18S[Title]) OR 28S ribosomal RNA[Title]) OR 28S rRNA[Title]) OR 28S[Title]) OR 5.8S ribosomal RNA[Title]) OR 5.8S rRNA[Title]) OR 5.8S[Title]) OR ribosomal RNA[Title]) OR SSU rRNA[Title]) OR LSU rRNA[Title]) OR SSU ribosomal RNA[Title]) OR LSU ribosomal RNA[Title]) OR Internal transcribed spacer[Title]) OR Internal transcribed spacer 1[Title]) OR Internal transcribed spacer 2[Title]) OR ITS[Title]) OR ITS1[Title]) OR ITS2[Title]) AND txid6231[Organism])) AND 200:10000[Sequence Length])) AND nuccore pubmed[Filter]) NOT unverified[Keyword] Downloaded as a fasta file. Next a list of clade III and V parasitic nematodes i.e. the Ascarids, Ancylostomatids, etc were obtained – these downloaded as a fasta file. Next this fasta file had the titles of the sequences changed to ‘sham’ titles to non-descript accession numbers e.g. Unidentified nematode 18S ribosomal RNA, partial sequence, # Simplify the headers of your database fasta file $ awk '{if($0~/^>/){print $1} else {print $0}}' Nemabiome_rRNA_fasta_v5_sequences.fasta > Nematoda_rRNA-ITS-5.8S_v5_30.04.24.fasta # Make a text file of all the accession numbers in the database fasta file $ awk '{if ($1~/^>/) print substr($1,2)}' Nematoda_rRNA-ITS-5.8S_v5_30.04.24.fasta > Nematoda_rRNA_v5_accession_ids.txt # Create a mapping table of each accession to its taxa id - takes about 10 minutes as it has to read each of the 300 million lines nucl_gb.accession2taxid $ awk -F"\t" 'BEGIN{while(getline<"Nematoda_rRNA_v5_accession_ids.txt") hash[$1]=1} {if ($2 in hash) print $2,$3}' nucl_gb.accession2taxid > Nematode_rRNA_v5_tax_map.txt # Make the blast database using the database fasta file for example: $ makeblastdb -in Nematoda_rRNA-ITS-5.8S_v5_30.04.24.fasta -parse_seqids -blastdb_version 5 -taxid_map Nematode_rRNA_v5_tax_map.txt -title "Nemabiome_rRNA database_v5" -out Nemabiome_rRNA_v5_db -dbtype nucl Final database files produced = 10. For example Nemabiome_rRNA_v5_db.ndb, Nemabiome_rRNA_v5_db.nhr, Nemabiome_rRNA_v5_db.nin These can be used by NanoCLUST e.g. in the command nextflow run main.nf -profile docker --reads '/home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/100_Sample_Comparison_96-well_Trial-4/pass/barcodes01-96/ITS-reads-amended-filtered/barcode30.tmp.inverse.pblat.fix.fastq-filt.fastq.gz' --db "db/Nemabiome_rRNA_v5_db" --tax "db" --min_read_length 700 --max_read_length 1800 --min_cluster_size 100 --polishing_reads 100 --cluster_sel_epsilon 1 --max_memory ’84.GB’ --max_cpus 12 --outdir ./Nemabiome_trial

18S、ITS1、ITS2及28S全序列线虫数据库: 基于18S核糖体RNA(18S rRNA)、28S核糖体RNA(28S rRNA)、ITS1、5.8S核糖体RNA(5.8S rRNA)及ITS2构建针对寄生线虫的NanoCLUST数据库。 线虫门分类学ID:6231(因此需使用txid6231)。 关键词: 18S核糖体RNA(18S ribosomal RNA)、18S rRNA(18S rRNA)、18S、28S核糖体RNA(28S ribosomal RNA)、28S rRNA(28S rRNA)、28S、5.8S核糖体RNA(5.8S ribosomal RNA)、5.8S rRNA(5.8S rRNA)、5.8S、核糖体RNA(Ribosomal RNA)、SSU rRNA、LSU rRNA、SSU核糖体RNA(SSU ribosomal RNA)、LSU核糖体RNA(LSU ribosomal RNA)、内转录间隔区(Internal transcribed spacer)、内转录间隔区1(Internal transcribed spacer 1)、内转录间隔区2(Internal transcribed spacer 2)、ITS、ITS1、ITS2 最终NCBI GenBank检索式: ((((((((((((((((((((((((18S核糖体RNA[标题]) OR 18S rRNA[标题]) OR 18S[标题]) OR 28S核糖体RNA[标题]) OR 28S rRNA[标题]) OR 28S[标题]) OR 5.8S核糖体RNA[标题]) OR 5.8S rRNA[标题]) OR 5.8S[标题]) OR 核糖体RNA[标题]) OR SSU rRNA[标题]) OR LSU rRNA[标题]) OR SSU核糖体RNA[标题]) OR LSU核糖体RNA[标题]) OR 内转录间隔区[标题]) OR 内转录间隔区1[标题]) OR 内转录间隔区2[标题]) OR ITS[标题]) OR ITS1[标题]) OR ITS2[标题]) AND txid6231[生物]) AND 200:10000[序列长度])) AND nuccore pubmed[筛选]) NOT 未验证[关键词] 以FASTA格式文件下载。 随后获取了进化枝III和V的寄生线虫序列集(即蛔虫类、钩口线虫类等),并以FASTA格式文件下载。 随后将该FASTA文件的序列标题替换为伪匿名标题,采用无意义的登录号格式,例如:"未鉴定线虫18S核糖体RNA 部分序列"。 # 简化数据库FASTA文件的序列标题 $ awk '{if($0~/^>/){print $1} else {print $0}}' Nemabiome_rRNA_fasta_v5_sequences.fasta > Nematoda_rRNA-ITS-5.8S_v5_30.04.24.fasta # 提取数据库FASTA文件中的所有登录号并生成文本文件 $ awk '{if ($1~/^>/) print substr($1,2)}' Nematoda_rRNA-ITS-5.8S_v5_30.04.24.fasta > Nematoda_rRNA_v5_accession_ids.txt # 构建登录号与分类学ID的映射表——该步骤耗时约10分钟,需读取包含3亿行的nucl_gb.accession2taxid文件 $ awk -F" " 'BEGIN{while(getline<"Nematoda_rRNA_v5_accession_ids.txt") hash[$1]=1} {if ($2 in hash) print $2,$3}' nucl_gb.accession2taxid > Nematode_rRNA_v5_tax_map.txt # 基于数据库FASTA文件构建BLAST数据库,示例命令如下: $ makeblastdb -in Nematoda_rRNA-ITS-5.8S_v5_30.04.24.fasta -parse_seqids -blastdb_version 5 -taxid_map Nematode_rRNA_v5_tax_map.txt -title "Nemabiome_rRNA database_v5" -out Nemabiome_rRNA_v5_db -dbtype nucl 最终生成10个数据库文件,例如:Nemabiome_rRNA_v5_db.ndb、Nemabiome_rRNA_v5_db.nhr、Nemabiome_rRNA_v5_db.nin。 该数据库可用于NanoCLUST分析,示例命令如下: nextflow run main.nf -profile docker --reads '/home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/100_Sample_Comparison_96-well_Trial-4/pass/barcodes01-96/ITS-reads-amended-filtered/barcode30.tmp.inverse.pblat.fix.fastq-filt.fastq.gz' --db "db/Nemabiome_rRNA_v5_db" --tax "db" --min_read_length 700 --max_read_length 1800 --min_cluster_size 100 --polishing_reads 100 --cluster_sel_epsilon 1 --max_memory ’84.GB’ --max_cpus 12 --outdir ./Nemabiome_trial
创建时间:
2024-12-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作