Wavelet-Enhanced Data-Driven Collective Variables for Efficient Sampling of Protein Folding Landscapes

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Wavelet-Enhanced_Data-Driven_Collective_Variables_for_Efficient_Sampling_of_Protein_Folding_Landscapes/31927624

下载链接

链接失效反馈

官方服务：

资源简介：

The development of collective variables (CVs) capable of distinguishing important (meta)stable states and describing slow degrees of freedom (DOFs) is widely recognized as a prerequisite for the success of CV-based enhanced sampling methods, particularly in biomolecular systems where the underlying free-energy landscapes are extremely complex. For complex biochemical processes, selecting suitable geometric CVs for enhanced sampling often relies on chemical intuition, which is challenging and system-dependent. While data-driven CVs have emerged as a promising approach and achieved significant success in recent years, studies indicate that their performance is highly susceptible to fast stochastic fluctuations in molecular simulation trajectories, thereby obscuring the identification of slow DOFs. To overcome this limitation, we turn to a feature extraction strategy that utilizes the discrete wavelet transform to filter out fast-mode motions from MD trajectories, followed by dimensionality reduction of the descriptors. This refined feature subset provides an optimal input for constructing data-driven low-dimensional CVs, allowing for the identification of biologically relevant slow modes with high fidelity. We validate the efficacy of this strategy through a comparative analysis of chignolin and further showcase its applicability to the structurally more complex BBA protein system, which features both helical and antiparallel β-sheet motifs. The results reveal that the CVs derived from this strategy facilitate rapid transitions between folded and unfolded states, thereby significantly accelerating the exploration of protein folding landscapes across systems of varying scales.

能够区分重要（亚）稳态并描述慢自由度（DOFs）的集体变量（CVs）的开发，被广泛认为是基于CV的增强采样方法取得成功的先决条件，尤其在内在自由能景观极为复杂的生物分子系统中。对于复杂生化过程而言，为增强采样选取合适的几何集体变量往往依赖化学直觉，这一过程既具挑战性又依赖具体体系。近年来，数据驱动集体变量虽已成为极具前景的方案并取得显著进展，但相关研究表明，其性能极易受分子模拟轨迹中快速随机涨落的影响，进而干扰慢自由度的识别。为克服这一局限，我们采用了一种特征提取策略：利用离散小波变换滤除分子动力学（MD）轨迹中的快模式运动，随后对描述符进行降维处理。经优化的特征子集可为构建数据驱动型低维集体变量提供最优输入，从而实现高保真度的生物学相关慢模式识别。我们通过对奇诺林蛋白（chignolin）的对比分析验证了该策略的有效性，并进一步展示了其在兼具螺旋与反平行β折叠基序、结构更为复杂的BBA蛋白系统中的适用性。研究结果显示，基于该策略得到的集体变量可促进折叠态与去折叠态间的快速跃迁，进而显著加速不同尺度体系下蛋白质折叠景观的探索进程。

创建时间：

2026-04-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集