Proteome database of 36 million proteins from 4,351 species, including marine microbial sequences
收藏DataONE2023-02-27 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:29cb94d4a84dd58a873c12956002a552df91985e94453f47705d221e09c6e3b2
下载链接
链接失效反馈官方服务:
资源简介:
A fasta-formatted database of 36,866,870 predicted proteins representing 4,351 unique species from 117 phyla., A database of 36,866,870 predicted proteins representing 4,351 unique species from 117 phyla (see table below) was constructed using the UniProt Reference Proteome (RP) at the 35% co-membership threshold including 4,295 Representative Proteome Groups (RPGs) (Chen et al. 2011) in addition to all taxonomically identifiable transcriptomes of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) (Keeling et al. 2014) that were processed through WinstonCleaner (https://github.com/kolecko007/WinstonCleaner). The database also included proteins inferred from the annotated and assembled genomes of Aurantiochytrium limacinum ATCC MYA-1381, Schizochytrium aggregatum ATCC 28209, and Aplanochytrium kerguelensis PBS07 from the U.S. Department of Energyâs Joint Genome Institute (JGI), all PFAM PF00494 Aurantiochytrium sp. KH105 proteome hits from the Okinawa Institute of Science and Technology Marine Genomics Unit genome browser, all of UniProt's annotated Hondaea fermentalgiana pr...,
本数据库采用FASTA格式(FASTA),收录36,866,870条预测蛋白质序列,涵盖117个门类、4351个独特物种。该数据库的构建基础为以35%共成员阈值筛选的UniProt参考蛋白质组(UniProt Reference Proteome, RP),其中纳入了4295个代表性蛋白质组群(Representative Proteome Groups, RPGs)(Chen等人,2011);此外还包含经WinstonCleaner(https://github.com/kolecko007/WinstonCleaner)质控处理的海洋微生物真核生物转录组测序项目(Marine Microbial Eukaryote Transcriptome Sequencing Project, MMETSP)中所有可进行分类学鉴定的转录组数据(Keeling等人,2014)。本数据库同时收录了源自美国能源部联合基因组研究所(Joint Genome Institute, JGI)注释组装完成的三个物种的推定蛋白质序列,对应物种分别为:Aurantiochytrium limacinum ATCC MYA-1381、Schizochytrium aggregatum ATCC 28209以及Aplanochytrium kerguelensis PBS07;另外还包含冲绳科学技术大学院大学海洋基因组研究组基因组浏览器中匹配PFAM蛋白质家族数据库(PFAM, Protein Families Database)PF00494家族的Aurantiochytrium sp. KH105蛋白质组序列,以及UniProt数据库中所有注释完成的Hondaea fermentalgiana相关蛋白质序列(原文此处截断)。
创建时间:
2025-07-17



