five

Final Putative Homologues of Spore Coat/Exosporium Proteins in Clostridia (GCM Thesis; Appendix 7.3)

收藏
DataCite Commons2024-12-13 更新2024-08-25 收录
下载链接:
https://www.repository.cam.ac.uk/handle/1810/349120
下载链接
链接失效反馈
官方服务:
资源简介:
<> Appendix 7.3 PDF file containing a list of final putative homologues in Clostridia species of established spore coat and exosporium proteins from literature. The listed data includes the protein name, bacterial species, accession number of that protein in that species, and the Markov Cluster number for that protein. Markov clustering was used to validate; edges were determined using reverse BLASTp protocol with SCPS edge-weight conversion: Refer to Appendix 7.1 and 7.2 for the first- and second- pass protocols. To verify the BLASTp homology search results, all putative proteins were analysed using MCL clustering with SCPS (spectral clustering of protein sequences) edge weight conversion of e-values. In addition to the raw homologue data, Clostridia coat proteins on UniProt (not identified in this analysis) were also included in the clustering dataset; this was done to verify their supposed identities. The inflation value was set at the recommended value of 0.2, and the edge-weight threshold was maximized. Edges were considered undirected so that only positive weights would be calculated. The clusterMaker2 application in the Cytoscape software was used to create the networks; the weak edge weight pruning threshold, number of iterations, maximum residual value, and maximum number of threads were set to their previously established values of 1E(−15), 16, 0.001, and 0—respectively. The MCL e-value threshold was set to 1E(−10) to minimize the number of proteins in each cluster. The resulting Q-value served as verification of the methodology. The overall Q value was 0.952. Refer to Appendix 7..4 for analysis of verified homologues.

附录7.3:一份收录已发表文献中梭菌(Clostridia)物种已确定孢子衣与外孢膜蛋白的推定同源蛋白最终列表的PDF文件。所收录的信息包含蛋白名称、细菌物种、该蛋白在对应物种中的登录号,以及该蛋白的马尔可夫聚类编号。本研究采用马尔可夫聚类开展验证工作;边的确定采用结合SCPS(蛋白质序列谱聚类,spectral clustering of protein sequences)e值边权转换的反向BLASTp流程;有关首轮及次轮分析流程,请参阅附录7.1与7.2。为验证BLASTp同源性搜索结果,本研究对所有推定蛋白采用结合SCPS e值边权转换的MCL(马尔可夫聚类,Markov Clustering)聚类分析。除原始同源蛋白数据外,UniProt(通用蛋白质知识库)数据库中未在本次分析中被鉴定的梭菌物种孢子衣蛋白亦被纳入聚类数据集,以验证其推定的蛋白身份。膨胀参数设置为推荐值0.2,并将边权阈值调至最大。边被设定为无向边,仅计算正权重值。使用Cytoscape软件中的clusterMaker2插件构建网络;弱边权修剪阈值、迭代次数、最大残差值与最大线程数分别设置为已确立的预设值:1E(-15)、16、0.001与0。将MCL的e值阈值设为1E(-10),以压缩每个聚类中的蛋白数量。所得Q值用于验证本研究方法的有效性,整体Q值为0.952。有关已验证同源蛋白的分析,请参阅附录7.4。
提供机构:
Apollo - University of Cambridge Repository
创建时间:
2023-04-06
二维码
社区交流群
二维码
科研交流群
商业服务