Microeukaryotic Protein Database (VDB_Microeukaryotic_v1)
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/Microeukaryotic_Protein_Database/19668855
下载链接
链接失效反馈官方服务:
资源简介:
Version: VDB_Microeukaryotic_v1
Contains 4 files:
-rw-r--r-- 1 jespinoz staff 10G Apr 18 19:46 reference.rmdup.iupac.relabeled.no_deprecated.complete_lineage.faa.gz
-rw-r--r-- 1 jespinoz staff 167M Apr 18 19:40 target_to_source.dict.pkl.gz
-rw-r--r-- 1 jespinoz staff 605K Apr 18 19:40 source_to_lineage.dict.pkl.gz
-rw-r--r-- 1 jespinoz staff 542K Apr 18 19:42 source_taxonomy.tsv.gz
* The main fasta protein file which is the dereplicated combination of NR (only protista and fungus), MMETSP, EukZoo, and EukProt. Only complete lineages are included since this is partially used for classification.
* .pkl.gz are Python gzipped pickled dictionaries.
* target_to_source.dict.pkl.gz has mapping between identifiers in fasta file and the original source
* source_to_lineage.dict.pkl.gz has the mapping between source identifiers and lineage strings (e.g., c__Aconoidasida;o__Haemosporida;f__Haemoproteidae;g__Haemoproteus;s__Haemoproteus sp. hCWT4)
* source_taxonomy.tsv.gz has the taxonomy for each source identifier
Citation:
* Espinoza, J.L., Dupont, C.L. VEBA: a modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes. BMC Bioinformatics 23, 419 (2022). https://doi.org/10.1186/s12859-022-04973-8
* Espinoza, Josh (2022): Microeukaryotic Protein Database. figshare. Dataset. https://doi.org/10.6084/m9.figshare.19668855.v1
创建时间:
2022-07-07



