Multiobjective Feature Selection Approach to Quantitative Structure Property Relationship Models for Predicting the Octane Number of Compounds Found in Gasoline
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Multiobjective_Feature_Selection_Approach_to_Quantitative_Structure_Property_Relationship_Models_for_Predicting_the_Octane_Number_of_Compounds_Found_in_Gasoline/4993751
下载链接
链接失效反馈官方服务:
资源简介:
Octane
number is one of the most important factors for determining
the price of gasoline. The increasing popularity of molecular models
in petroleum refining has made predicting key properties for pure
components more important. In this paper, quantitative structure property
relationship (QSPR) models are developed to predict the research octane
number (RON) and motor octane number (MON) of pure components using
two databases. The databases include oxygenated and nitrogen-containing
compounds as well as hydrocarbons collected from published data. QSPR
models are widely utilized because they effectively characterize molecular
structures with a variety of descriptors, especially different isomeric
structures. Feature subset selection is an important step for increasing
the performance and simplifying the complexity of a QSPR model by
removing redundant and irrelevant descriptors. A two-step feature
selection method is developed to identify appropriate subsets of descriptors
from a multiobjective perspective: (1) a filter using the Boruta algorithm to remove noise features and (2) a multiobjective
wrapper to simultaneously minimize the number of features and maximize
the model accuracy. A multiobjective wrapper is developed to account
for both the complexity and generalizability of models to resist overfitting,
which commonly occurs when using a single-objective feature selection
method. In the proposed procedure, optimized subsets of descriptors
are used to build the final QSPR models to predict the RON and MON
of pure components via support vector machine regression. The proposed
models are competitive with other models found in the literature.
创建时间:
2017-05-10



