Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure–Property Relationships
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Resolving_Transition_Metal_Chemical_Space_Feature_Selection_for_Machine_Learning_and_Structure_Property_Relationships/5603017
下载链接
链接失效反馈官方服务:
资源简介:
Machine
learning (ML) of quantum mechanical properties shows promise
for accelerating chemical discovery. For transition metal chemistry
where accurate calculations are computationally costly and available
training data sets are small, the molecular representation becomes
a critical ingredient in ML model predictive accuracy. We introduce
a series of revised autocorrelation functions (RACs) that encode relationships
of the heuristic atomic properties (e.g., size, connectivity, and
electronegativity) on a molecular graph. We alter the starting point,
scope, and nature of the quantities evaluated in standard ACs to make
these RACs amenable to inorganic chemistry. On an organic molecule
set, we first demonstrate superior standard AC performance to other
presently available topological descriptors for ML model training,
with mean unsigned errors (MUEs) for atomization energies on set-aside
test molecules as low as 6 kcal/mol. For inorganic chemistry, our
RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state
splitting in comparison to 15–20× higher errors for feature
sets that encode whole-molecule structural information. Systematic
feature selection methods including univariate filtering, recursive
feature elimination, and direct optimization (e.g., random forest
and LASSO) are compared. Random-forest- or LASSO-selected subsets
4–5× smaller than the full RAC set produce sub- to 1 kcal/mol
spin-splitting MUEs, with good transferability to metal–ligand
bond length prediction (0.004–5 Å MUE) and redox potential
on a smaller data set (0.2–0.3 eV MUE). Evaluation of feature
selection results across property sets reveals the relative importance
of local, electronic descriptors (e.g., electronegativity, atomic
number) in spin-splitting and distal, steric effects in redox potential
and bond lengths.
创建时间:
2017-11-15



