Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations

Name: Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations
Creator: figshare
Published: 2022-10-14 14:25:31
License: 暂无描述

DataCite Commons2022-10-14 更新2024-07-29 收录

下载链接：

https://figshare.com/articles/dataset/Forces_are_not_Enough_Benchmark_and_Critical_Evaluation_for_Machine_Learning_Force_Fields_with_Molecular_Simulations/21331245

下载链接

链接失效反馈

官方服务：

资源简介：

The preprocessed datasets described in the paper: "Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations." Paper: https://arxiv.org/abs/2210.07237 Code: https://github.com/kyonofx/MDsim/ Dataset Description: The MD17 dataset and the LiPS dataset are adapted from previous work. The source data can be found in the hyperlinks. We include the source data for our alanine dipeptide dataset (alanine_dipeptide.npy) and water dataset (water.npy), along with preprocessed datasets for all datasets in the paper: MD17, water, alanine dipeptide, and LiPS (mdsim_data.tar.gz). Please refer to the paper for details on each dataset. Paper Abstract: Molecular dynamics (MD) simulation techniques are widely used for various natural science applications. Increasingly, machine learning (ML) force field (FF) models begin to replace ab-initio simulations by predicting forces directly from atomic structures. Despite significant progress in this area, such techniques are primarily benchmarked by their force/energy prediction errors, even though the practical use case would be to produce realistic MD trajectories. We aim to fill this gap by introducing a novel benchmark suite for ML MD simulation. We curate representative MD systems, including water, organic molecules, peptide, and materials, and design evaluation metrics corresponding to the scientific objectives of respective systems. We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics. We demonstrate when and how selected SOTA methods fail, along with offering directions for further improvement. Specifically, we identify stability as a key metric for ML models to improve. Our benchmark suite comes with a comprehensive open-source codebase for training and simulation with ML FFs to facilitate further work. If you find this dataset useful, please consider reference in your paper: <pre><code>@article{fu2022forces, title={Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations}, author={Xiang Fu and Zhenghao Wu and Wujie Wang and Tian Xie and Sinan Keten and Rafael Gomez-Bombarelli and Tommi Jaakkola}, journal={arXiv preprint arXiv:2210.07237}, year={2022}}</code></pre> For the MD17 dataset, reference: <pre><code>@article{chmiela2017machine, title={Machine learning of accurate energy-conserving molecular force fields}, author={Chmiela, Stefan and Tkatchenko, Alexandre and Sauceda, Huziel E and Poltavsky, Igor and Sch{\"u}tt, Kristof T and M{\"u}ller, Klaus-Robert}, journal={Science advances}, volume={3}, number={5}, pages={e1603015}, year={2017}, publisher={American Association for the Advancement of Science} }</code></pre> For the LiPS dataset, reference: <pre><code>@article{batzner20223, title={E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials}, author={Batzner, Simon and Musaelian, Albert and Sun, Lixin and Geiger, Mario and Mailoa, Jonathan P and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E and Kozinsky, Boris}, journal={Nature communications}, volume={13}, number={1}, pages={1--11}, year={2022}, publisher={Nature Publishing Group} } </code></pre>

本论文《力场并非全部：基于分子模拟的机器学习力场基准测试与批判性评估》中提及的预处理数据集。 论文链接：https://arxiv.org/abs/2210.07237 代码仓库：https://github.com/kyonofx/MDsim/ 数据集说明： MD17数据集与LiPS数据集改编自既往研究，其原始数据可通过对应超链接获取。本数据集包含丙氨酸二肽数据集（alanine_dipeptide.npy）与水数据集（water.npy）的原始数据，同时附带本论文涉及的全部数据集的预处理版本：MD17、水、丙氨酸二肽以及LiPS数据集（打包文件为mdsim_data.tar.gz）。各数据集的详细信息请参阅原论文。 论文摘要： 分子动力学（Molecular Dynamics, MD）模拟技术已广泛应用于各类自然科学研究场景。当前，机器学习（Machine Learning, ML）力场（Force Field, FF）模型通过直接从原子结构预测原子受力，正逐步取代从头算（ab-initio）模拟。尽管该领域已取得显著进展，但现有此类技术的基准测试仍主要以受力/能量预测误差为指标，而其实际应用目标应为生成符合物理真实的MD模拟轨迹。为填补这一空白，我们提出了一套面向ML辅助MD模拟的新型基准测试套件。我们精选了具有代表性的MD模拟体系，包括水体系、有机分子、多肽以及材料体系，并针对各体系的科学研究目标设计了对应的评估指标。我们对一系列最先进（state-of-the-art, SOTA）的ML力场模型开展了基准测试，尤其阐明了常用的受力精度指标与实际模拟相关指标之间的不匹配性。我们展示了所选SOTA方法失效的场景与机制，并为后续改进提供了方向。具体而言，我们指出模型稳定性是ML力场模型亟需优化的关键指标。本基准测试套件附带一套完整的开源代码库，用于ML力场的训练与模拟，以推动后续相关研究工作。 若您认为本数据集对您的研究有所帮助，请在论文中引用如下文献：<pre><code>@article{fu2022forces, title={Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations}, author={Xiang Fu and Zhenghao Wu and Wujie Wang and Tian Xie and Sinan Keten and Rafael Gomez-Bombarelli and Tommi Jaakkola}, journal={arXiv预印本 arXiv:2210.07237}, year={2022}}</code></pre> 关于MD17数据集，请引用如下文献：<pre><code>@article{chmiela2017machine, title={高精度能量守恒分子力场的机器学习方法}, author={Chmiela, Stefan and Tkatchenko, Alexandre and Sauceda, Huziel E and Poltavsky, Igor and Schütt, Kristof T and Müller, Klaus-Robert}, journal={《科学进展》}, volume={3}, number={5}, pages={e1603015}, year={2017}, publisher={美国科学促进会} }</code></pre> 关于LiPS数据集，请引用如下文献：<pre><code>@article{batzner20223, title={面向数据高效且高精度原子间势的E(3)等变图神经网络}, author={Batzner, Simon and Musaelian, Albert and Sun, Lixin and Geiger, Mario and Mailoa, Jonathan P and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E and Kozinsky, Boris}, journal={《自然·通讯》}, volume={13}, number={1}, pages={1--11}, year={2022}, publisher={自然出版集团} }</code></pre>

提供机构：

figshare

创建时间：

2022-10-14

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集为机器学习力场在分子动力学模拟中的基准评估提供预处理数据，包含水、有机分子、肽和材料等代表性系统。其核心特点是设计科学指标以评估力场模型的模拟性能，强调力预测误差与真实模拟轨迹之间的差距，并促进模型稳定性的改进。数据集支持相关论文的研究，旨在推动机器学习在计算物理领域的应用发展。

以上内容由遇见数据集搜集并总结生成