microsoft/msr-acc-tae25
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/microsoft/msr-acc-tae25
下载链接
链接失效反馈官方服务:
资源简介:
微软研究精确化学集合(MSR-ACC)提供了一个精确耦合簇标签的集合,用于训练机器学习功能。MSR-ACC/TAE25包含73,040个总原子化能量值,这些值是通过CCSD(T)/CBS级别的W1-F12热化学协议获得的。数据集的设计目的是全面覆盖化学空间,包括闭壳层、电荷中性、共价键平衡分子结构,最多包含5个非氢原子,且这些原子来自氩元素以下的元素,且不具有显著的多重参考特性。该数据集为开发和评估普遍适用的机器学习、密度泛函理论和半经验方法开辟了道路,并已用于训练第一个达到化学精度的交换相关功能。与较小或更专业化的TAE数据库相比,MSR-ACC/TAE25提供了一个大而化学多样的测试集,用于识别系统性错误和验证近似电子结构方法。数据集的规模、多样性和准确性使其不仅适用于开发深度学习DFT方法,还适用于训练和验证模型,如图神经网络,以及创建高度特定的基准以回答实际问题。
The Microsoft Research Accurate Chemistry Collection (MSR-ACC) provides a collection of accurate coupled cluster labels for training machine learning functionals. MSR-ACC/TAE25 comprising 73,040 total atomization energies at the CCSD(T)/CBS level obtained with the W1-F12 thermochemical protocol. The dataset is constructed to exhaustively cover the chemical space of closed-shell, charge-neutral, covalently bound equilibrium molecular structures containing up to 5 non-hydrogen atoms drawn from elements up to argon and lacking significant multireference character. MSR-ACC/TAE25 opens the way for developing and evaluating generally applicable machine-learning, density functional theory, and semi-empirical methods, and has already been used for training the first exchange correlation functional to reach chemical accuracy for atomization energies. In contrast to smaller or more specialized TAE databases, MSR-ACC/TAE25 provides a large and chemically diverse test set for identifying systematic errors and validating approximate electronic structure methods. The size, diversity, and accuracy of the dataset make it useful not only for developing deep-learning DFT methods, but also for training and validating models such as graph neural networks, and for creating highly specific benchmarks to answer practical questions.
提供机构:
microsoft



