Table_1_Comparing feature selection and machine learning approaches for predicting CYP2D6 methylation from genetic variation.docx
收藏frontiersin.figshare.com2024-02-21 更新2025-01-08 收录
下载链接:
https://frontiersin.figshare.com/articles/dataset/Table_1_Comparing_feature_selection_and_machine_learning_approaches_for_predicting_CYP2D6_methylation_from_genetic_variation_docx/25256062/1
下载链接
链接失效反馈官方服务:
资源简介:
IntroductionPharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to CYP2D6 in children from the GUSTO cohort.MethodsBuffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with CYP2D6 were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the CYP2D6 gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models’ performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites.ResultsOverall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model.DiscussionThe development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.
当前,药物基因组学主要基于少数基因中有限数量的变异来支持临床决策,并在需要更精确剂量时对儿科用药具有潜在益处。将甲基化等基因组信息整合到药物基因组学模型中,具有提高其准确性的潜力,从而改善用药决策。细胞色素P450 2D6(CYP2D6)是一种高度多态性的基因,传统上与常用药物和内源性底物的代谢相关。因此,本研究旨在从GUSTO队列儿童中与CYP2D6相关的单核苷酸多态性(SNPs)预测表观遗传位点。方法:采用Illumina Infinium Methylation EPIC beadchip对 Buffy coat DNA甲基化进行量化。将CYP2D6相关的CpG位点用作线性回归、弹性网络和XGBoost模型的因变量。我们比较了来自GWAS mQTLs、GTEx eQTLs和CYP2D6基因附近2 MB范围内的SNPs的特征选择,以及添加人口统计学数据的影响。将样本分为训练集(75%)和测试集(25%)以进行验证。在弹性网络模型和XGBoost模型中,使用10折交叉验证进行最优超参数搜索。通过均方根误差和R平方值来研究每个模型的性能。在进行GWAS以确定与CpG位点相关的SNPs时,共确定了15个SNPs,其中多个SNPs似乎影响多个CpG位点。结果:总体而言,基于遗传特征的弹性网络模型似乎在遗传力估计略好,而线性回归和XGBoost模型则显著更好。非遗传特征的添加似乎改善了某些但并非所有特征集和探针的性能。CpG位点和多个顶级变量之间的最佳特征集和机器学习(ML)方法存在显著差异。讨论:本研究针对不同种族的新加坡儿童CYP2D6 CpG甲基化的SNP预测模型开发,具有临床应用价值。经过进一步验证,这些模型可能有助于扩充提高精准医学和基于药物基因组学用药的工具集。
提供机构:
Frontiers



