Automatic Prediction of Band Gaps of Inorganic Materials Using a Gradient Boosted and Statistical Feature Selection Workflow

Name: Automatic Prediction of Band Gaps of Inorganic Materials Using a Gradient Boosted and Statistical Feature Selection Workflow
Creator: ACS Publications
Published: 2024-02-06 00:00:00
License: 暂无描述

acs.figshare.com2024-02-06 更新2025-01-21 收录

下载链接：

https://acs.figshare.com/articles/dataset/Automatic_Prediction_of_Band_Gaps_of_Inorganic_Materials_Using_a_Gradient_Boosted_and_Statistical_Feature_Selection_Workflow/25155221/1

下载链接

链接失效反馈

官方服务：

资源简介：

Machine learning (ML) methods can train a model to predict material properties by exploiting patterns in materials databases that arise from structure–property relationships. However, the importance of ML-based feature analysis and selection is often neglected when creating such models. Such analysis and selection are especially important when dealing with multifidelity data because they afford a complex feature space. This work shows how a gradient-boosted statistical feature-selection workflow can be used to train predictive models that classify materials by their metallicity and predict their band gap against experimental measurements, as well as computational data that are derived from electronic-structure calculations. These models are fine-tuned via Bayesian optimization, using solely the features that are derived from chemical compositions of the materials data. We test these models against experimental, computational, and a combination of experimental and computational data. We find that the multifidelity modeling option can reduce the number of features required to train a model. The performance of our workflow is benchmarked against state-of-the-art algorithms, the results of which demonstrate that our approach is either comparable to or superior to them. The classification model realized an accuracy score of 0.943, a macro-averaged F1-score of 0.940, area under the curve of the receiver operating characteristic curve of 0.985, and an average precision of 0.977, while the regression model achieved a mean absolute error of 0.246, a root-mean squared error of 0.402, and R2 of 0.937. This illustrates the efficacy of our modeling approach and highlights the importance of thorough feature analysis and judicious selection over a “black-box” approach to feature engineering in ML-based modeling.

机器学习（ML）方法能够通过利用材料数据库中由结构-性能关系产生的模式来训练模型以预测材料性能。然而，在构建此类模型时，基于机器学习的特征分析和选择的必要性往往被忽视。在处理多保真度数据时，这种分析和选择尤为重要，因为它们提供了一个复杂的特征空间。本研究展示了如何利用梯度提升统计特征选择工作流程来训练预测模型，这些模型通过金属丰度对材料进行分类，并预测其带隙与实验测量值以及从电子结构计算中得到的计算数据。这些模型通过贝叶斯优化进行微调，仅使用从材料数据的化学组成中提取的特征。我们对这些模型进行了实验、计算以及实验与计算数据相结合的测试。我们发现，多保真度建模选项可以减少训练模型所需的特征数量。我们的工作流程的性能与最先进的算法进行了基准测试，结果显示，我们的方法要么与它们相当，要么优于它们。分类模型实现了0.943的准确率、0.940的宏平均F1分数、0.985的接收器操作特性曲线下面积以及0.977的平均精度，而回归模型实现了0.246的均方绝对误差、0.402的均方根误差和0.937的决定系数。这说明了我们建模方法的有效性，并突出了在基于机器学习的建模中对特征进行彻底分析和审慎选择的必要性，而非采用“黑盒”方法进行特征工程。

提供机构：

ACS Publications

5,000+

优质数据集

54 个

任务类型

进入经典数据集