five

Data-Error Scaling in Machine Learning on Natural Discrete Combinatorial Mutation-prone Sets: Case Studies on Peptides and Small Molecules

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11148307
下载链接
链接失效反馈
官方服务:
资源简介:
Data  This folder contains the raw data used during this work. `out_seq_total.txt` contains information on the sequences used (mutations, number of mutations, etc.). `output_energies_total.txt` contains the response variables, which include: a) unrelaxed EvoEF energies (peptides); b) relaxed EvoEF energies (peptides, `*_repaired.txt`); and c) solvation energies (molecules). 3D structures are provided in `.xyz` format in the subfolder `XYZ` (molecules).   Results This folder contains the results (outputs) of the ML models trained using the provided scripts (see github repository). Such results are incuded in the form of `.npy` files. To load the files please include the option `allow_pickle=True`.  Each `.npy` file contains the following keys: * `initial_parameters`: script inputs. * `d_encoder`: encoder used (not always included). * `ns_train`: number of training points used for the LCs (rounded, integers). * `ns_train_float`: number of training points used for the LCs (not rounded, float). * `ns_train_norm`: number of training points used for the LCs (normalized, float). * `res`: test MAEs. * `res_tot`: (train,validation,test) MAEs. * `res_tot_mut`: (train,validation,test) MAEs sorted by mutation number. * `l_opt`: optimal kernel length used during the test. * `ls`: kernel lengths used for grid search. * `idx_seeds`: indices used to reshuffle the data. If one want to rebild the initial order use `np.argsort(idx_seeds)`. * `alpha_opt`: optimal regression parameters used to calculate the test error. To sort the data use `alpha_opt[i][ii][np.argsort(idx_seeds[ii,0:arg_train_max].astype(int)[:ns_train[i]]`. Where `i` is the replicate number (0-99) and ii is the idex in the LC. * `valid_errs`: validation error (MAE) calculated for each point in the hyperparameter (kernel scale) optimisation. * `test_errs`: test error (MAE) calculated for each point in the hyperparameter (kernel scale) optimisation.
创建时间:
2024-05-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作