SCFbench
收藏魔搭社区2025-12-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ByteDance-Seed/SCFbench
下载链接
链接失效反馈官方服务:
资源简介:
<div align="center">
👋 Hi, everyone!
<br>
We are <b>ByteDance Seed team.</b>
</div>
<p align="center">
You can get to know us better through the following channels👇
<br>
<a href="https://seed.bytedance.com/">
<img src="https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>
<a href="https://github.com/user-attachments/assets/5793e67c-79bb-4a59-811a-fcc7ed510bd4">
<img src="https://img.shields.io/badge/WeChat-07C160?style=for-the-badge&logo=wechat&logoColor=white"></a>
<a href="https://www.xiaohongshu.com/user/profile/668e7e15000000000303157d?xsec_token=ABl2-aqekpytY6A8TuxjrwnZskU-6BsMRE_ufQQaSAvjc%3D&xsec_source=pc_search">
<img src="https://img.shields.io/badge/Xiaohongshu-%23FF2442?style=for-the-badge&logo=xiaohongshu&logoColor=white"></a>
<a href="https://www.zhihu.com/org/dou-bao-da-mo-xing-tuan-dui/">
<img src="https://img.shields.io/badge/zhihu-%230084FF?style=for-the-badge&logo=zhihu&logoColor=white"></a>
</p>

# Towards A Universally Transferable Acceleration Method for Density Functional Theory
Zhe Liu, Yuyan Ni, Zhichen Pu, Qiming Sun, Siyuan Liu & Wen Yan
https://arxiv.org/abs/2509.25724
# TL;DR
We propose a framework for accelerating DFT calculations.
We train E(3)-equivariant neural networks to predict the expansion coefficients of the electron density in an auxiliary basis, and use the prediction to construct an initial guess for the SCF process. This approach exhibits superior transferability in various aspects.
# Contents
The repo currently contains the following contents:
* The full SCFbench dataset.
* The data pipeline for the SCFbench dataset.
* The PyTorch `nn.Module` of the species-wise linear layer for the prediction of the electron density coefficients.
* The NequIP model architecture with the species-wise linear layer.
* Example code for computing the density coefficients from a density matrix.
We will also release the following items soon:
* The training code for models.
* The full evaluation code.
# Requirements
* torch
* e3nn
* pyscf
* lmdb
* numpy>1.26
* nequip (if you want to use the NequIP model)
# Dataset Usage
The sample dataset contains the `main` dataset (the dataset for training, validation and in-distribution testing) and the `ood-test` dataset.
Each dataset contains several `parts`, each of which corresponds to a specific piece of information. The parts are:
* `base`: the basic information of the molecule, including atomic numbers, coordinates, etc.
* `dm`: the density matrix of the molecule.
* `fock`: the Hamiltonian (fock) matrix of the molecule.
* `auxdensity.denfit`: the density coefficients on def2-universal-jfit.
* `auxdensity.denfit.etb2.0`: the density coefficients on the ETB basis of def2-svp with $\beta=2.0$.
* `auxdensity.denfit.etb1.5`: the density coefficients on the ETB basis of def2-svp with $\beta=1.5$.
Example:
```python
from dataset import SCFBenchDataset
# Loading base info (atomic numbers, coordinates, etc.), density matrix, Hamiltonian (fock) matrix and the density coefficients on def2-universal-jfit.
parts_to_load = ['base', 'dm', 'fock', 'auxdensity.denfit']
dataset = SCFBenchDataset(data_root='dataset/main', parts_to_load=parts_to_load)
dataset[0].keys()
# Loading the base info and the density coefficients on the ETB basis of def2-svp with $\beta=1.5$.
parts_to_load = ['base', 'auxdensity.denfit.etb1.5']
dataset = SCFBenchDataset(data_root='dataset/ood-test', parts_to_load=parts_to_load, auxbasis='etb:def2-svp:1.5')
dataset[0].keys()
# for the raw data, use the underlying dataset
dataset.dataset[0].keys()
```
# Citing SCFbench
If you use SCFbench in your research, please cite:
```latex
@misc{liu2025universallytransferableaccelerationmethod,
title={Towards A Universally Transferable Acceleration Method for Density Functional Theory},
author={Zhe Liu and Yuyan Ni and Zhichen Pu and Qiming Sun and Siyuan Liu and Wen Yan},
year={2025},
eprint={2509.25724},
archivePrefix={arXiv},
primaryClass={physics.chem-ph},
url={https://arxiv.org/abs/2509.25724},
}
```
## License
Models are licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
The dataset is a derivative of [ChEMBL](https://www.ebi.ac.uk/chembl/), used under [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/).
Our modified version, the SCFbench dataset, is also licensed under [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/).
## About [ByteDance Seed Team](https://seed.bytedance.com/)
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
<div align="center">
👋 大家好!
<br>
我们是<b>字节跳动Seed团队</b>。
</div>
<p align="center">
您可以通过以下渠道进一步了解我们👇
<br>
<a href="https://seed.bytedance.com/">
<img src="https://img.shields.io/badge/官网-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>
<a href="https://github.com/user-attachments/assets/5793e675-79bb-4a59-811a-fcc7ed510bd4">
<img src="https://img.shields.io/badge/微信-07C160?style=for-the-badge&logo=wechat&logoColor=white"></a>
<a href="https://www.xiaohongshu.com/user/profile/668e7e15000000000303157d?xsec_token=ABl2-aqekpytY6A8TuxjrwnZskU-6BsMRE_ufQQaSAvjc%3D&xsec_source=pc_search">
<img src="https://img.shields.io/badge/小红书-%23FF2442?style=for-the-badge&logo=xiaohongshu&logoColor=white"></a>
<a href="https://www.zhihu.com/org/dou-bao-da-mo-xing-tuan-dui/">
<img src="https://img.shields.io/badge/知乎-%230084FF?style=for-the-badge&logo=zhihu&logoColor=white"></a>
</p>

# 面向密度泛函理论的通用可迁移加速方法
Zhe Liu, Yuyan Ni, Zhichen Pu, Qiming Sun, Siyuan Liu & Wen Yan
https://arxiv.org/abs/2509.25724
# 核心摘要
我们提出了一种用于加速密度泛函理论(Density Functional Theory, DFT)计算的框架。我们训练E(3)等变神经网络(E(3)-equivariant neural networks)来预测辅助基组下电子密度的展开系数,并利用该预测结果为自洽场(Self-Consistent Field, SCF)过程构建初始猜测。该方法在多个维度展现出优异的可迁移性。
# 仓库内容
本代码仓库目前包含以下内容:
* 完整的SCFbench数据集
* SCFbench数据集的数据处理流水线
* 用于预测电子密度系数的物种线性层的PyTorch `nn.Module` 实现
* 搭载物种线性层的NequIP模型架构
* 从密度矩阵计算密度系数的示例代码
我们还将在近期发布以下内容:
* 模型训练代码
* 完整的评估代码
# 依赖要求
* torch
* e3nn
* pyscf
* lmdb
* numpy>1.26
* 若需使用NequIP模型,还需安装nequip
# 数据集使用指南
示例数据集包含`main`数据集(用于训练、验证和分布内测试)以及`ood-test`(分布外测试)数据集。每个数据集包含若干`parts`,每个`part`对应一类特定信息,具体如下:
* `base`:分子的基础信息,包括原子序数、坐标等
* `dm`:分子的密度矩阵
* `fock`:分子的哈密顿(Fock)矩阵
* `auxdensity.denfit`:def2-universal-jfit基组下的密度系数
* `auxdensity.denfit.etb2.0`:def2-svp的ETB基组下($eta=2.0$)的密度系数
* `auxdensity.denfit.etb1.5`:def2-svp的ETB基组下($eta=1.5$)的密度系数
示例代码:
python
from dataset import SCFBenchDataset
# 加载基础信息(原子序数、坐标等)、密度矩阵、哈密顿(Fock)矩阵以及def2-universal-jfit基组下的密度系数
parts_to_load = ['base', 'dm', 'fock', 'auxdensity.denfit']
dataset = SCFBenchDataset(data_root='dataset/main', parts_to_load=parts_to_load)
dataset[0].keys()
# 加载基础信息以及def2-svp的ETB基组下($eta=1.5$)的密度系数
parts_to_load = ['base', 'auxdensity.denfit.etb1.5']
dataset = SCFBenchDataset(data_root='dataset/ood-test', parts_to_load=parts_to_load, auxbasis='etb:def2-svp:1.5')
dataset[0].keys()
# 如需访问原始数据,请使用底层数据集对象
dataset.dataset[0].keys()
# 引用SCFbench数据集
若您在研究中使用SCFbench数据集,请引用如下文献:
latex
@misc{liu2025universallytransferableaccelerationmethod,
title={Towards A Universally Transferable Acceleration Method for Density Functional Theory},
author={Zhe Liu and Yuyan Ni and Zhichen Pu and Qiming Sun and Siyuan Liu and Wen Yan},
year={2025},
eprint={2509.25724},
archivePrefix={arXiv},
primaryClass={physics.chem-ph},
url={https://arxiv.org/abs/2509.25724},
}
## 许可证
模型采用[Apache许可证2.0版](http://www.apache.org/licenses/LICENSE-2.0)进行许可。本数据集衍生自[ChEMBL](https://www.ebi.ac.uk/chembl/),依据[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/)协议使用。我们修改后的SCFbench数据集同样依据[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/)协议进行许可。
## 关于字节跳动Seed团队
字节跳动Seed团队成立于2023年,致力于打造行业领先的人工智能基础模型。团队立志成为世界顶尖的科研团队,为科学与社会的进步作出卓越贡献。
提供机构:
maas
创建时间:
2025-12-02



