five

Proteome database of Escherichia coli K-12

收藏
Figshare2019-04-26 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/Proteome_database_of_Escherichia_coli_K-12/8046290
下载链接
链接失效反馈
官方服务:
资源简介:
Ensemble of proteins in a bacterial species hold relevance for understanding the biochemical and metabolic activities of a cell at a global level. To this end, proteomics technologies have opened a window into the inner workings of cells, allowing rational design changes to be implemented at the cellular level for overproduction of specified metabolite through heterologous expression of particular genes or pathways. Such omics technologies generally generate large amount of information that requires bioinformatic approaches for inferring biological meaning. For example, useful information such as protein name and amino acid sequence information needs to be extracted from proteome data file automatically through bioinformatic tools. Hence, this work uses an in-house MATLAB function to construct a proteome database of Escherichia coli K-12. Information encapsulated in the proteome database include protein name, amino acid sequence, number of residues in protein, molecular weight of protein and nucleotide sequence of protein. In particular, protein name and amino acid sequence information are extracted from the original fasta proteome file, while number of amino acid residues, molecular weight and nucleotide sequence of each protein in the proteome are calculated using built-in functions in MATLAB. Collectively, the proteome database of E. coli K-12 should find use in diverse biology and biotechnology applications ranging from understanding the molecular weight of individual proteins to synthesizing a gene in molecular cloning workflow.

细菌物种的蛋白质组,对于从全局层面解析细胞的生化与代谢活动具有重要意义。为此,蛋白质组学(proteomics)技术为窥探细胞内部运作机制提供了契机,使得研究人员能够通过特定基因或通路的异源表达,在细胞层面进行合理设计改造,以过量生产目标代谢物。此类组学技术通常会产生海量信息,需借助生物信息学方法方能推导其生物学意义。例如,需通过生物信息学工具从蛋白质组数据文件中自动提取蛋白质名称、氨基酸序列等有效信息。因此,本研究采用自研MATLAB函数,构建了大肠杆菌K-12(Escherichia coli K-12)的蛋白质组数据库。该数据库包含的信息包括:蛋白质名称、氨基酸序列、蛋白质残基数量、蛋白质分子量以及蛋白质核苷酸序列。具体而言,蛋白质名称与氨基酸序列信息从原始FASTA格式蛋白质组文件中提取,而蛋白质组中各蛋白质的氨基酸残基数、分子量及核苷酸序列,则通过MATLAB内置函数计算得到。综上,大肠杆菌K-12蛋白质组数据库可应用于众多生物学与生物技术场景,从解析单个蛋白质的分子量,到分子克隆流程中的基因合成都具有应用价值。
创建时间:
2019-04-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作