A billion-scale microbial enzymes atlas
收藏DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20021021
下载链接
链接失效反馈官方服务:
资源简介:
More than one billion microbial enzymes with Enzyme Commission class annotations predicted by using the multimodal pretrained model for enzymes, RAMER (https://github.com/Ming-Ni-Group/RAMER). The enzymes were extracted from over three billion proteins sequences downloaded from databases including MGnify, Global Catalogue of Metagenomics (gcMeta), Global Ocean Microbiome Genomic Catalogue (GOMC), and a recently reported microbiome dataset from the Mariana Trench Sediment (DeepSea).
gcMeta(env-24) (https://gcmeta.wdcm.org/) includes wetland sediment, bean rhizosphere, Arabidopsis rhizosphere, wheat rhizosphere, rice rhizosphere, agricultural soil, freshwater riverine, groundwater, permafrost, acid mine drainage, hot spring, hot habitat, acid habitat, saline-alkaline habitat, drinking water, hydrothermal vent, saline lake, pressure habitat, cold habitat, marine sediment, freshwater lake water, wastewater, freshwater sediment, and contains enzymes (.fasta), the EC annotation (.pkl), ID mapping (id.csv), and enzyme classifier results (enzyme.json).
DeepSea contains enzymes (.fasta), the EC annotation (.pkl), and enzyme classifier results (enzyme.json).
MGnify contains the EC annotation (.pkl) of enzymes that we predicted. If you need the sequences, you can download them from MGnify (https://www.ebi.ac.uk/metagenomics).
GOMC contains the EC annotation (.pkl) of partial proteins (1,120,919,867 proteins), including all proteins over 200 amino acids and a partial set of proteins under 200 amino acids, as well as enzyme classifier results (enzyme.json). If you need the sequences, you can download them from GOMC (https://ftp.cngb.org/pub/SciRAID/microbiomics/MDB0000002/GOPC.geneset.pep.fa.gz).
In addition to the above data, NMPFamDB (https://pavlopoulos-lab.org/NMPFamsDB/) contains two subsets: NMPFamDB_Bacteria and NMPFamDB_Viruses. It provides enzymes (.fasta) and EC annotation (.pkl).
Actinomycetes contains three datasets. NCBI_genomes contains EC annotation (.pkl) and enzyme classifier results (.json). UniprotKB_250-350_tax201174 contains EC annotation (.pkl) and ID mapping(id.csv), and UniprotKB_EMBLWGS_250-350 contains EC annotation (.pkl).
提供机构:
Zenodo
创建时间:
2026-05-04



