five

VEBA Microeukaryotic Protein Database (MicroEuk100/90/50, Version 3)

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10139450
下载链接
链接失效反馈
官方服务:
资源简介:
Microeukaryotic protein database consisting of protists and fungi for VEBA.   Number of sequences:  * MicroEuk100 = 79,920,431 (19 GB)  * MicroEuk90  = 51,767,730 (13 GB)  * MicroEuk50  = 29,898,853 (6.5 GB)   Number of source organisms per dataset: * MycoCosm = 2503 * PhycoCosm = 174 * EnsemblProtists = 233 * MMETSP = 759 * TARA_SAGv1 = 8 * EukProt = 366 * EukZoo = 27 * TARA_SMAGv1 = 389 * NR_Protists-Fungi = 48217   Files: MicroEuk_v3.tar.gz = 25 GB -rw-rw---- 1 jespinoz jcl110  19G Nov 15 14:57 MicroEuk100.faa.gz - Main fasta file with 79,920,431 protein sequences from 52,676 source organisms.  Uses md5 hash identifiers. -rw-rw---- 1 jespinoz jcl110 2.0G Nov 15 14:59 identifier_mapping.proteins.tsv.gz - Protein identifier mappings between datasets, original identifiers, source organisms, and md5 hash identifiers. -rw-rw---- 1 jespinoz jcl110 1.7G Nov 15 16:10 MicroEuk90_clusters.tsv.gz - MMSEQS2 clustering MicroEuk100 -rw-rw---- 1 jespinoz jcl110 1.5G Nov 15 14:57 MicroEuk100.list.gz - List of md5 hash protein identifiers in MicroEuk100 -rw-rw---- 1 jespinoz jcl110 1.1G Nov 15 16:10 MicroEuk50_clusters.tsv.gz - MMSEQS2 clustering MicroEuk90 -rw-rw---- 1 jespinoz jcl110  13M Nov 15 23:39 MicroEuk100.eukaryota_odb10.list.gz - MicroEuk100 protein identifier hits to BUSCO's eukaryota_odb10 marker using the provided score thresholds -rw-rw---- 1 jespinoz jcl110 1.5M Nov 15 14:58 source_taxonomy.tsv.gz - Source taxonomy, lineage, dataset, and notes for each source organism   For more information and citations, please visit the main GitHub repository:  https://github.com/jolespin/veba
创建时间:
2023-12-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作