five

Classification and QSAR models of leukotriene A4 hydrolase (LTA4H) inhibitors by machine learning methods

收藏
DataCite Commons2021-05-04 更新2024-07-28 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Classification_and_QSAR_models_of_leukotriene_A4_hydrolase_LTA4H_inhibitors_by_machine_learning_methods/14483310
下载链接
链接失效反馈
官方服务:
资源简介:
Leukotriene A4 hydrolase (LTA4H) is an important anti-inflammatory target which can convert leukotriene A4 (LTA4) into pro-inflammatory substance leukotriene B4 (LTB4). In this paper, we built 18 classification models for 463 LTA4H inhibitors by using support vector machine (SVM), random forest (RF) and K-Nearest Neighbour (KNN). The best classification model (Model 2A) was built from RF and MACCS fingerprints. The prediction accuracy of 88.96% and the Matthews correlation coefficient (MCC) of 0.74 had been achieved on the test set. We also divided the 463 LTA4H inhibitors into six subsets using K-Means. We found that the highly active LTA4H inhibitors mostly contained diphenylmethane or diphenyl ether as the scaffold and pyridine or piperidine as the side chain. In addition, six quantitative structure–activity relationship (QSAR) models for 172 LTA4H inhibitors were built by multiple linear regression (MLR) and SVM. The best QSAR model (Model 6A) was built by using SVM and CORINA Symphony descriptors. The coefficients of determination of the training set and the test set were equal to 0.81 and 0.79, respectively. Classification and QSAR models could be used for subsequent virtual screening, and the obtained fragments that were important for highly active inhibitors would be helpful for designing new LTA4H inhibitors.

白三烯A4水解酶(Leukotriene A4 hydrolase, LTA4H)是一类重要的抗炎靶点,可将白三烯A4(Leukotriene A4, LTA4)转化为促炎物质白三烯B4(Leukotriene B4, LTB4)。本研究针对463个LTA4H抑制剂,采用支持向量机(SVM)、随机森林(RF)及K近邻算法(KNN)构建了18种分类模型。其中最优分类模型(模型2A)基于随机森林与MACCS指纹构建,在测试集上实现了88.96%的预测准确率及0.74的马修斯相关系数(MCC)。 本研究同时通过K-Means聚类方法,将463个LTA4H抑制剂划分为6个子集。分析结果显示,高活性LTA4H抑制剂大多以二苯甲烷或二苯醚作为分子骨架,以吡啶或哌啶作为侧链。 此外,针对172个LTA4H抑制剂,本研究通过多元线性回归(MLR)与支持向量机(SVM)构建了6种定量构效关系(QSAR)模型。其中最优QSAR模型(模型6A)基于支持向量机与CORINA Symphony描述符构建,其训练集与测试集的决定系数分别为0.81与0.79。 本研究所构建的分类模型与QSAR模型可用于后续虚拟筛选,所得到的对高活性抑制剂至关重要的分子片段,将为新型LTA4H抑制剂的设计提供有益参考。
提供机构:
Taylor & Francis
创建时间:
2021-04-26
二维码
社区交流群
二维码
科研交流群
商业服务