ApicomplexanDB

Name: ApicomplexanDB
Creator: University of Melbourne
Published: 2023-03-24 03:47:28
License: 暂无描述

DataCite Commons2023-03-24 更新2025-04-17 收录

下载链接：

https://melbourne.figshare.com/articles/dataset/ApicomplexanDB/22153529

下载链接

链接失效反馈

官方服务：

资源简介：

Construction of Filarial Worm and Apicomplexan Haemoparasite Databases for NanoCLUST: Within NCBI nucleotide the filarial worm COI gene Db was constructed using the search terms: (((((((((((cytochrome c oxidase subunit 1[Title]) OR cytochrome c oxidase subunit I) OR cytochrome oxidase subunit 1) OR cytochrome oxidase subunit I) OR COX1) OR CO1) OR COI)) AND txid6295[Organism:exp])) AND 100:100000[Sequence Length]) And the NCBI accession NR_029255.1 (Aliivibrio fischeri) required for identification of our positive control. Additionally a second filarial worm Db was constructed from the same sequences downloaded using the aforementioned search terms with the inclusion of the dog genome GCF_014441545.1 (Canis lupus familiaris). For construction of the apicomplexan 18S rRNA gene Db the search terms used were: ((((((18S ribosomal RNA[Title]) OR 18S rRNA[Title]) OR ribosomal RNA[Title]) OR SSU rRNA[Title]) OR SSU ribosomal RNA[Title]) AND txid5794[Organism]) AND 200:10000[Sequence Length] Plus the addition of NR_029255.1 (Aliivibrio fischeri) required for positive control identification. The specific fasta sequences were chosen and downloaded as a fasta file from NCBI. Extracted accession numbers from the fasta headers and produce a single column text file. Downloaded the large NCBI accession2taxid database - a text file: ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz Created a mapping table of each accession to its taxa id using nucl_gb.accession2taxid The awk command: awk -F"\t" 'BEGIN{while(getline<"accession_ids.txt") hash[$1]=1} {if ($2 in hash) print $2,$3}' nucl_gb.accession2taxid > [Db_name]_map.txt This will take a list of accession numbers "accession_ids.txt" and the downloaded accession2taxid database to produce a two column mapping file called [Db_name]_map.txt Then used the makeblastdb command, downloadable from https://blast.ncbi.nlm.nih.gov/Blast.cgi makeblastdb -in Filaria_AllCOI_species.fasta -parse_seqids -blastdb_version 5 -taxid_map [Db_name]_map.txt -title "[Db_name] database" -out [Db_name] -dbtype nucl This produces the blast database, consisting of 10 files required by NanoCLUST.

面向NanoCLUST的丝虫与顶复门血寄生虫数据库构建：在NCBI核苷酸数据库中，以搜索词`((((((((((cytochrome c oxidase subunit 1[Title]) OR cytochrome c oxidase subunit I) OR cytochrome oxidase subunit 1) OR cytochrome oxidase subunit I) OR COX1) OR CO1) OR COI)) AND txid6295[Organism:exp])) AND 100:100000[Sequence Length]` 构建丝虫细胞色素c氧化酶亚基1（COI）基因数据库，同时纳入NCBI登录号NR_029255.1（费氏弧菌 *Aliivibrio fischeri*）作为阳性对照的鉴定参照。此外，基于上述搜索词下载的序列集合，额外加入家犬 *Canis lupus familiaris* 基因组GCF_014441545.1，构建了第二套丝虫基因数据库。针对顶复门18S核糖体RNA（18S rRNA）与小亚基核糖体RNA（SSU rRNA）基因数据库，使用搜索词`(((((18S ribosomal RNA[Title]) OR 18S rRNA[Title]) OR ribosomal RNA[Title]) OR SSU rRNA[Title]) OR SSU ribosomal RNA[Title]) AND txid5794[Organism]) AND 200:10000[Sequence Length]`，同时纳入NR_029255.1（费氏弧菌）作为阳性对照鉴定参照。从NCBI数据库选取目标FASTA序列并下载为FASTA格式文件。从FASTA文件的序列头部提取登录号，生成单列表文本文件。下载NCBI官方的大型登录号-分类学ID映射数据库：`ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz`。利用该数据库与前述登录号列表，通过以下AWK命令生成登录号与分类学ID的映射表： awk -F" " 'BEGIN{while(getline<"accession_ids.txt") hash[$1]=1} {if ($2 in hash) print $2,$3}' nucl_gb.accession2taxid > [Db_name]_map.txt 该命令将读取登录号列表`"accession_ids.txt"`与下载的登录号-分类学ID数据库，生成两列格式的映射文件`[Db_name]_map.txt`。随后使用可从https://blast.ncbi.nlm.nih.gov/Blast.cgi 获取的`makeblastdb`工具，执行如下命令构建适配NanoCLUST的BLAST数据库： makeblastdb -in Filaria_AllCOI_species.fasta -parse_seqids -blastdb_version 5 -taxid_map [Db_name]_map.txt -title "[Db_name] database" -out [Db_name] -dbtype nucl 该命令将生成包含10个文件的BLAST数据库，满足NanoCLUST的使用需求。

提供机构：

University of Melbourne

创建时间：

2023-03-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集