Makino et al., Supplementary Data for "Anomaly-Detection-Driven Screening of Synthesizability from Composition Descriptors Alone"

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Makino_et_al_Supplementary_Data_for_Anomaly-Detection-Driven_Screening_of_Synthesizability_from_Composition_Descriptors_Alone_/30575660

下载链接

链接失效反馈

官方服务：

资源简介：

Supplementary data for the paper, Makino et al., "Anomaly-Detection-Driven Screening of Synthesizability from Composition Descriptors Alone" A zip file contains following data and codes. 01_training_dataset The training dataset employed to train the autoencoder in this study.(CSV formatted) 02_best_hyparams_log.csv This file is a CSV table summarizing the results of hyperparameter search performed by 01_PyTorch_optuna.py and 02_PyTorch_bestmodel.py, listing the objective value of the best trial, the corresponding mini-batch size, learning rate, and number of epochs, as well as the coefficients of determination (R2) for the training and test data under these conditions. 03_model_best.pth A representative trained autoencoder model learned under the hyperparameters obtained from optimization. (PyTorch .pth format) 04_requirements.txt This file is a requirements file summarizing the Python environment (main libraries and their versions) used for training the autoencoder in this study (.txt format). 01_PyTorch_optuna.py This file is a Python script for reading a compositional descriptor dataset (01_training_dataset.csv) and performing hyperparameter search of the training conditions (batch size and learning rate) for an autoencoder using Optuna. The output is in CSV format. To use this Python script with a different dataset or model architecture, you need to modify the "Load data" and "Autoencoder model" sections within the script. 02_Pytorch_bestmodel.py This file is a Python script for reading the CSV file output by 01_PyTorch_optuna.py, loading the optimal mini-batch size and learning rate, and retraining an autoencoder model on a compositional descriptor dataset (01_training_dataset.csv). Using 5-fold cross-validation, it records the training and validation losses, selects the “best epoch” as the one with a small validation loss and a minimal gap between training and validation losses, and then retrains the model with this number of epochs. The final model is saved in the PyTorch .pth file format, and prediction plots for the training and test data are also generated. To use this Python script with a different dataset or model architecture, you need to modify the "Load data" and "Autoencoder model" sections within the script. 03_calculate_RMSE.py This file is a Python script for loading the .pth file output by 02_PyTorch_bestmodel.py and computing the sample-wise reconstruction error (RMSE) for an arbitrary compositional descriptor dataset (datasets.csv). The calculated results are output in CSV format. To use this Python script with a different dataset, you need to modify the "Load data" section within the script.

创建时间：

2026-02-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集