Utilizing machine learning techniques to predict the blood-brain barrier permeability of compounds detected using LCQTOF-MS in Malaysian Kelulut honey
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Utilizing_machine_learning_techniques_to_predict_the_blood-brain_barrier_permeability_of_compounds_detected_using_LCQTOF-MS_in_Malaysian_Kelulut_honey/23635244/1
下载链接
链接失效反馈官方服务:
资源简介:
Current in silico modelling techniques, such as molecular dynamics, typically focus on compounds with the highest concentration from chromatographic analyses for bioactivity screening. Consequently, they reduce the need for labour-intensive in vitro studies but limit the utilization of extensive chromatographic data and molecular diversity for compound classification. Compound permeability across the blood–brain barrier (BBB) is a key concern in central nervous system (CNS) drug development, and this limitation can be addressed by applying cheminformatics with codeless machine learning (ML). Among the four models developed in this study, the Random Forest (RF) algorithm with the most robust performance in both internal and external validation was selected for model construction, with an accuracy (ACC) of 87.5% and 86.9% and area under the curve (AUC) of 0.907 and 0.726, respectively. The RF model was deployed to classify 285 compounds detected using liquid chromatography quadrupole time-of-flight mass spectrometry (LCQTOF-MS) in Kelulut honey; of which, 140 compounds were screened with 94 descriptors. Seventeen compounds were predicted to permeate the BBB, revealing their potential as drugs for treating neurodegenerative diseases. Our results highlight the importance of employing ML pattern recognition to identify compounds with neuroprotective potential from the entire pool of chromatographic data.
当前的计算机模拟(in silico modelling)技术,例如分子动力学(molecular dynamics),通常以色谱分析中浓度最高的化合物为对象开展生物活性筛选(bioactivity screening)。此类技术虽可减少对劳动密集型体外(in vitro)实验的依赖,却也限制了对大规模色谱数据与分子多样性的利用,无法支撑化合物分类任务。化合物穿过血脑屏障(blood–brain barrier, BBB)的能力是中枢神经系统(central nervous system, CNS)药物研发的核心关切之一,而结合化学信息学(cheminformatics)与无代码机器学习(codeless machine learning, ML)即可解决这一局限。本研究构建的四款模型中,在内部与外部验证中均表现出最优稳健性能的随机森林(Random Forest, RF)算法被选为建模工具,其准确率(accuracy, ACC)分别为87.5%与86.9%,曲线下面积(area under the curve, AUC)分别为0.907与0.726。该随机森林模型被用于分类从Kelulut蜂蜜中通过液相色谱-四极杆飞行时间质谱(liquid chromatography quadrupole time-of-flight mass spectrometry, LCQTOF-MS)检测到的285种化合物,其中140种化合物通过94个分子描述符完成筛选。经预测,共有17种化合物可穿透血脑屏障,提示其具备开发为神经退行性疾病(neurodegenerative diseases)治疗药物的潜力。本研究结果凸显了借助机器学习模式识别技术,从全部色谱数据集中筛选具备神经保护潜力化合物的重要价值。
创建时间:
2023-07-14



