timewarp
收藏魔搭社区2025-12-05 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/microsoft/timewarp
下载链接
链接失效反馈官方服务:
资源简介:
# Timewarp datasets
This dataset contains molecular dynamics simulation data that was used to train the neural networks in the NeurIPS 2023 paper [Timewarp: Transferable Acceleration of Molecular Dynamics by Learning Time-Coarsened Dynamics](https://arxiv.org/abs/2302.01170) by Leon Klein, Andrew Y. K. Foong, Tor Erlend Fjelde, Bruno Mlodozeniec, Marc Brockschmidt, Sebastian Nowozin, Frank Noé, and Ryota Tomioka.
Please see the [accompanying GitHub repository](https://github.com/microsoft/timewarp).
This dataset consists of many molecular dynamics trajectories of small peptides (2-4 amino acids) simulated with an implicit water force field.
For each protein two files are available:
* `protein-state0.pdb`: contains the topology and initial 3D XYZ coordinates.
* `protein-arrays.npz`: contains trajectory information.
The datasets are are split into the following directories:
# 2AA-1-big "Two Amino Acid" data set
This folder contains a data set of all-atom molecular dynamics trajectories for 380
of the 400 dipeptides, i.e. small proteins composed of two amino acids.
This dataset was orginally created missing 20 of the 400 possible dipeptides.
The `2AA-1-complete` dataset completes this by including all 400.
Each peptide is simulated using classical molecular dynamics and the
water is simulated using an implicit water model.
The trajectories are only saved every 10000 MD steps. There is no intermediate
spacing as for the other datasets for the Timewarp project.
# 2AA-1-complete "Two Amino Acid" data set
This folder contains a data set of all-atom molecular dynamics trajectories for all 400
dipeptides, i.e. small proteins composed of two amino acids.
This includes also the peptides missing in the other 2AA datasets.
Each peptide is simulated using classical molecular dynamics and the
water is simulated using an implicit water model.
# 4AA-huge "Four Amino Acid" data set, tetrapeptides
This folder contains a data set of all-atom molecular dynamics trajectories for
tetrapeptides, i.e. small proteins composed of four amino acids.
The data set contains mostly validation and test trajectories as it was mostly
used to validation and test purposes.
The training trajectories used are usually shorter.
Each peptide is simulated for 1 micro second using classical molecular dynamics and the
water is simulated using an implicit water model.
# 4AA-large "Four Amino Acid" data set, tetrapeptides
This folder contains a data set of all-atom molecular dynamics trajectories for
2333 tetrapeptides, i.e. small proteins composed of four amino acids.
The data set is split into 1500 tetra-peptides in the train set, 400 in validation, and 433 in test.
Each peptide in the train set is simulated for 50ns using classical molecular dynamics and the
water is simulated using an implicit water model. Each other peptide is simulated for 500ns.
# AD-3 Alanine dipeptide data set
This folder contains a minimal data set of two long MD trajectories for alanine
dipeptide, the simplest dipeptide (22 atoms).
## Model training and checkpoints
Model checkpoints and config files are also included, and source code for training the model can be found [here](https://github.com/microsoft/timewarp).
## Responsible AI FAQ
- What is Timewarp?
- Timewarp is a neural network that predicts the future 3D positions of a small peptide (2- 4 amino acids) based on its current state. It is a research project that investigates using deep learning to accelerate molecular dynamics simulations.
- What can Timewarp do?
- Timewarp can be used to sample from the equilibrium distribution of small peptides.
- What is/are Timewarp’s intended use(s)?
- Timewarp is intended for machine learning and molecular dynamics research purposes only.
- How was Timewarp evaluated? What metrics are used to measure performance?
- Timewarp was evaluated by comparing the speed of molecular dynamics sampling with standard molecular dynamics systems that rely on numerical integration. Timewarp is sometimes faster than these standard systems.
- What are the limitations of Timewarp? How can users minimize the impact of Timewarp’s limitations when using the system?
- As a research project, Timewarp has many limitations. The main ones are that it only works for very small peptides (2-4 amino acids), and that it does not lead to a wall-clock speed up for many peptides.
- What operational factors and settings allow for effective and responsible use of Timewarp?
- Timewarp should be used purely for research purposes only.
## Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
## Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
# Timewarp数据集
本数据集包含用于训练NeurIPS 2023论文《Timewarp: 通过学习时间粗化动力学实现分子动力学的可迁移加速》(Timewarp: Transferable Acceleration of Molecular Dynamics by Learning Time-Coarsened Dynamics)中神经网络的分子动力学(molecular dynamics, MD)模拟数据,作者为Leon Klein、Andrew Y. K. Foong、Tor Erlend Fjelde、Bruno Mlodozeniec、Marc Brockschmidt、Sebastian Nowozin、Frank Noé与Ryota Tomioka。相关配套GitHub仓库请参见https://github.com/microsoft/timewarp。
本数据集包含大量采用隐式水势场模拟的小肽(2-4个氨基酸)分子动力学轨迹。针对每种肽,提供两个文件:
* `protein-state0.pdb`:包含拓扑结构与初始三维XYZ坐标。
* `protein-arrays.npz`:包含轨迹信息。
数据集按以下目录划分:
## 2AA-1-big「二肽」数据集
该文件夹包含400种二肽(由两个氨基酸构成的小肽)中380种的全原子分子动力学轨迹数据集。本数据集最初仅包含400种可能二肽中的380种,缺失了20种;`2AA-1-complete`数据集则补全了全部400种二肽。每种肽均采用经典分子动力学模拟,水体采用隐式水模型。该数据集的轨迹仅每10000个MD步骤保存一次,与Timewarp项目的其他数据集不同,其无中间间隔保存的轨迹。
## 2AA-1-complete「二肽」数据集
该文件夹包含全部400种二肽(由两个氨基酸构成的小肽)的全原子分子动力学轨迹数据集,涵盖了其他2AA数据集中缺失的肽类。每种肽均采用经典分子动力学模拟,水体采用隐式水模型。
## 4AA-huge「四肽」数据集
该文件夹包含四肽(由四个氨基酸构成的小肽)的全原子分子动力学轨迹数据集。本数据集主要包含验证与测试轨迹,因其主要用于验证与测试场景,所用训练轨迹通常更短。每种肽采用经典分子动力学模拟1微秒,水体采用隐式水模型。
## 4AA-large「四肽」数据集
该文件夹包含2333种四肽(由四个氨基酸构成的小肽)的全原子分子动力学轨迹数据集。数据集划分为训练集1500种、验证集400种与测试集433种。训练集内的每种肽采用经典分子动力学模拟50纳秒,其余肽类则模拟500纳秒,水体均采用隐式水模型。
## AD-3 丙氨酸二肽数据集
该文件夹包含丙氨酸二肽(最简单的二肽,含22个原子)的两条长分子动力学轨迹的极简数据集。
## 模型训练与检查点
本数据集还包含模型检查点与配置文件,模型训练的源代码可参见https://github.com/microsoft/timewarp。
## 负责任AI常见问题解答
- 什么是Timewarp?
- Timewarp是一种神经网络,可基于当前状态预测小肽(2-4个氨基酸)的未来三维位置。该项目旨在研究利用深度学习加速分子动力学模拟。
- Timewarp可以实现什么功能?
- Timewarp可用于从肽的平衡分布中进行采样。
- Timewarp的预期用途是什么?
- Timewarp仅可用于机器学习与分子动力学研究场景。
- Timewarp是如何评估的?采用哪些指标衡量性能?
- Timewarp通过对比分子动力学采样速度与依赖数值积分的标准分子动力学系统进行评估。Timewarp在部分场景下比标准系统更快。
- Timewarp存在哪些局限性?用户在使用时应如何降低其局限性的影响?
- 作为一项研究项目,Timewarp存在诸多局限性。核心局限在于其仅适用于极小肽类(2-4个氨基酸),且无法为多数肽类实现实际的时钟加速。
- 哪些操作因素与设置可实现Timewarp的高效且负责任的使用?
- Timewarp仅可纯粹用于研究目的。
## 贡献指南
本项目欢迎贡献与建议。大多数贡献需您签署贡献者许可协议(Contributor License Agreement, CLA),以证明您有权且实际已授予我们使用您贡献内容的权利。详情请访问https://cla.opensource.microsoft.com。
当您提交拉取请求(Pull Request, PR)时,CLA机器人将自动判断您是否需要签署CLA并对PR进行适当标注(例如状态检查、注释)。只需按照机器人提供的指引操作即可,您使用CLA签署一次即可覆盖所有仓库。
本项目已采用[Microsoft开源行为准则](https://opensource.microsoft.com/codeofconduct)。更多信息请参见[行为准则常见问题解答](https://opensource.microsoft.com/codeofconduct/faq/)或发送邮件至opencode@microsoft.com咨询其他问题。
## 商标声明
本项目可能包含项目、产品或服务的商标或标识。微软(Microsoft)商标或标识的授权使用需遵守[微软商标与品牌使用指南](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general)。在修改后的项目版本中使用微软商标或标识不得造成混淆,亦不得暗示微软背书。第三方商标或标识的使用需遵守第三方的相关政策。
提供机构:
maas
创建时间:
2025-07-22



