five

Data for: Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model

收藏
Mendeley Data2024-06-25 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/snstf5rd5n
下载链接
链接失效反馈
官方服务:
资源简介:
The "Dataset_HIR" folder contains the data to reproduce the results of the data mining approach proposed in the manuscript titled "Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model". More specifically, the folder contains the raw electronic structure calculation input data provided by the domain experts as well as the training and testing dataset with the extracted features. The "Dataset_HIR" folder contains the following subfolders namely: 1. Electronic structure calculation input data: contains the electronic structure calculation input generated by the Gaussian program 1.1. Testing data: contains the raw data of all training species (each is stored in a separate folder) used for extracting dataset for training and validation phases. 1.2. Testing data: contains the raw data of all testing species (each is stored in a separate folder) used for extracting data for the testing phase. 2. Dataset 2.1. Training dataset: used to produce the results in Tables 3 and 4 in the manuscript + datasetTrain_raw.csv: contains the features for all vibrational modes associated with corresponding labeled species to let the chemists select the Hindered Internal Rotor from the list easily for the training and validation steps. + datasetTrain.csv: refines the datasetTrain_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the modeling and validation steps. 2.2. Testing dataset: used to produce the results of the data mining approach in Table 5 in the manuscript. + datasetTest_raw.csv: contains the features for all vibrational modes of each labeled species to let the chemists select the Hindered Internal Rotor from the list for the testing step. + datasetTest.csv: refines the datasetTest_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the testing step. Note for the Result feature in the dataset: 1 is for the mode needed to be treated as Hindered Internal Rotor, and 0 otherwise.

"Dataset_HIR"文件夹包含了复现题为《复杂化学物质受阻内旋转模式识别:基于多元逻辑回归模型的数据挖掘方法》的论文中提出的数据挖掘方法结果所需的全部数据。具体而言,该文件夹包含领域专家提供的原始电子结构计算输入数据,以及带有已提取特征的训练与测试数据集。 "Dataset_HIR"文件夹包含如下子文件夹: 1. 电子结构计算输入数据:包含由Gaussian 1.1程序生成的电子结构计算输入文件 1.1 测试数据:包含用于提取训练与验证阶段数据集的全部训练样本的原始数据(每个样本存储于独立文件夹中) 1.2 测试数据:包含用于提取测试阶段数据的全部测试样本的原始数据(每个样本存储于独立文件夹中) 2. 数据集 2.1 训练数据集:用于复现论文中表3与表4的结果 + datasetTrain_raw.csv:包含所有与对应标记样本相关的振动模式特征,便于化学家在训练与验证步骤中从列表中轻松筛选受阻内转子(Hindered Internal Rotor)。 + datasetTrain.csv:对datasetTrain_raw.csv进行了优化处理,移除了所有样本名称,将数据集转换为适用于建模与验证步骤的标准格式。 2.2 测试数据集:用于复现论文中表5的数据挖掘方法结果 + datasetTest_raw.csv:包含每个标记样本的全部振动模式特征,便于化学家在测试步骤中从列表中筛选受阻内转子(Hindered Internal Rotor)。 + datasetTest.csv:对datasetTest_raw.csv进行了优化处理,移除了所有样本名称,将数据集转换为适用于测试步骤的标准格式。 关于数据集中的结果特征的说明:特征值1代表该振动模式需要按受阻内转子(Hindered Internal Rotor)进行处理,特征值0则代表无需如此处理。
创建时间:
2024-01-23
二维码
社区交流群
二维码
科研交流群
商业服务