Data-Driven Approach Considering Imbalance in Data Sets and Experimental Conditions for Exploration of Photocatalysts
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Data-Driven_Approach_Considering_Imbalance_in_Data_Sets_and_Experimental_Conditions_for_Exploration_of_Photocatalysts/28774015
下载链接
链接失效反馈官方服务:
资源简介:
In the field of data-driven material development, an
imbalance
in data sets where data points are concentrated in certain regions
often causes difficulties in building regression models when machine
learning methods are applied. One example of inorganic functional
materials facing such difficulties is photocatalysts. Therefore, advanced
data-driven approaches are expected to help efficiently develop novel
photocatalytic materials even if an imbalance exists in data sets.
We propose a two-stage machine learning model aimed at handling imbalanced
data sets without data thinning. In this study, we used two types
of data sets that exhibit the imbalance: the Materials Project data
set (openly shared due to its public domain data) and the in-house
metal-sulfide photocatalyst data set (not openly shared due to the
confidentiality of experimental data). This two-stage machine learning
model consists of the following two parts: the first regression model,
which predicts the target quantitatively, and the second classification
model, which determines the reliability of the values predicted by
the first regression model. We also propose a search scheme for variables
related to the experimental conditions based on the proposed two-stage
machine learning model. This scheme is designed for photocatalyst
exploration, taking experimental conditions into account as the optimal
set of variables for these conditions is unknown. The proposed two-stage
machine learning model improves the prediction accuracy of the target
compared with that of the one-stage model.
创建时间:
2025-04-10



