Protein Sequence Data for ESM2 Landscapes
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/amirgroup-codes/InteractionRecovery
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了三种蛋白质的蛋白序列数据:绿色荧光蛋白(GFP)、肿瘤蛋白P53(TP53)和G域蛋白B1(GB1),重点关注它们在ESM2景观及其相互作用方面的信息。此外,数据集根据每种蛋白质的3D结构和实验重要性,包含了各蛋白的不同区域,并分析了氨基酸突变在ESM2景观中的影响。该数据集规模涉及228项实验,覆盖了三种蛋白质,这些蛋白质在稀疏性和崎岖度上各有不同。研究任务旨在分析蛋白质中的ESM2景观,以提取并解释突变间的相互作用。
This dataset contains protein sequence data for three proteins: green fluorescent protein (GFP), tumor protein P53 (TP53), and G-domain protein B1 (GB1), with a focus on their ESM2 landscapes and related interaction information. Additionally, based on the 3D structures and experimental significance of each protein, the dataset includes distinct regions of the three proteins, and analyzes the effects of amino acid mutations on the ESM2 landscapes. This dataset encompasses 228 experiments across the three proteins, which exhibit distinct characteristics in terms of sparsity and ruggedness. The core research task of this dataset is to analyze the ESM2 landscapes within proteins, so as to extract and interpret the interactions between mutations.



