Graph neural network dataset and training logs for the publication "Broken neural scaling laws in materials science"

Figshare2026-01-21 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Graph_neural_network_dataset_and_training_logs_for_the_publication_Broken_neural_scaling_laws_in_materials_science_/31112554

下载链接

链接失效反馈

官方服务：

资源简介：

This repository contains a ready-to-use machine learning dataset and training logs associated with the publication "Broken neural scaling laws in materials science".The code implementing the graph neural network models and the training workflow using this dataset is available at: https://github.com/MaxGrossmann/optimetalThe data are provided in the form of compressed archives that are organized as follows.Dataset archive:The archive "dataset.zip" contains the following three HDF5 files: "train.h5", "val.h5", and "test.h5". These files store the training, validation, and test splits used for the reported models. Details on the data format and parsing are provided in the associated code repository (see above). The underlying raw dataset was generated using a high-throughput, ab initio workflow and is distributed across two repositories due to its size: https://doi.org/10.6084/m9.figshare.31111798 and https://doi.org/10.6084/m9.figshare.31112491.Graph archive:The archive "graph.zip" contains three corresponding PyTorch files: "train.pt", "val.pt", and "test.pt". These files store the corresponding graph representations used as input to the graph neural networks, including node features, edge (two-body) indices, and angle (three-body) indices. Parsing and usage of these files are documented in the associated code repository (see above).Training logs archive:The archive "training_logs.zip" contains directories corresponding to specific groups of training runs, including:"Scaling_Base", containing training runs used to extract one-dimensional neural scaling laws,"Ablation", containing training runs used for architecture optimization and ablation studies,and more ...Within each directory, the subdirectories are named according to the model configuration, random seed, and (where applicable) additional hyperparameters. These subdirectories contain TensorBoard log files, job submission scripts (LSF files), and other outputs produced during training.Together, this data enables reproduction of the entire training, evaluation, and scaling-law analysis process of the reported model.

创建时间：

2026-01-21