Gene shaving using a sensitivity analysis of kernel based machine learning approach, with applications to cancer data

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://figshare.com/articles/dataset/Gene_shaving_using_a_sensitivity_analysis_of_kernel_based_machine_learning_approach_with_applications_to_cancer_data/8177183

下载链接

链接失效反馈

官方服务：

资源简介：

Background Gene shaving (GS) is an essential and challenging tools for biomedical researchers due to the large number of genes in human genome and the complex nature of biological networks. Most GS methods are not applicable to non-linear and multi-view data sets. While the kernel based methods can overcome these problems, a well-founded positive definite kernel based GS method has yet to be proposed for biomedical data analysis. Methods and findings Since the kernel based methods on genomic information can improve the prediction of diseases, here we proposed a noble method, “kernel based gene shaving” which is based on the influence function of kernel canonical correlation analysis. To investigate the performance of the proposed method in comparison to state-of-the-art-method in gene saving, we analyzed extensive simulated and real microarray gene expression data set. The performance metrics including true positive rate, true negative rate, false positive rate, false negative rate, misclassification error rate, the false discovery rate and area under curves were computed for each methods. In colon cancer data analysis, the proposed method identified a significant subsets of 210 genes out of 2000 genes and suggestive superior performance compared with other methods. The proposed method can be applied to the study of other disease process where two view data is a common task. Conclusions We addressed the challenge of finding unique kernel based GS methods by using the influence function of kernel canonical correlation analysis. The proposed method has shown to have better performance than state-of-the-art-methods in gene saving and has identified many more significant gene interactions, suggesting that genes function in a concerted effort in colon cancer. In similar biomedical data analysis, kernel based methods could be applied to select a potential subset of genes. The positive definite kernel based methods can overcome the non-linearity problem and improve the prediction process.

背景基因剔选（Gene shaving, GS）是生物医学研究者至关重要且极具挑战性的研究工具，这源于人类基因组包含海量基因，且生物网络具有复杂的内在特性。当前绝大多数基因剔选方法无法适用于非线性多视图数据集。尽管基于核的方法能够克服上述局限，但具备严谨理论基础的基于正定核的基因剔选方法，尚未被提出用于生物医学数据分析。方法与结果鉴于基于核的基因组信息分析方法能够提升疾病预测性能，本文提出了一种基于核典型相关分析（kernel canonical correlation analysis）影响函数的新型方法——基于核的基因剔选。为了对比本方法与当前主流基因剔选方法的性能，本研究分析了大规模模拟数据集与真实微阵列基因表达数据集。针对每种方法，我们计算了包括真阳性率、真阴性率、假阳性率、假阴性率、错分误差率、错误发现率以及受试者工作特征曲线下面积（AUC）在内的多项性能评估指标。在结肠癌数据分析中，本方法从2000个基因中筛选出210个具有显著统计学意义的基因子集，且展现出更具优势的性能。本方法可推广应用于双视图数据较为常见的其他疾病进程研究。结论本研究借助核典型相关分析的影响函数，解决了构建专属基于核的基因剔选方法所面临的挑战。实验结果表明，本方法在基因剔选任务中相较当前主流方法具备更优性能，且筛选出了更多具有显著意义的基因互作关系，这提示结肠癌发生过程中基因以协同方式发挥功能。在同类生物医学数据分析场景中，基于核的方法可用于筛选具有潜在研究价值的基因子集。基于正定核的方法能够解决非线性建模问题，优化疾病预测流程。

创建时间：

2019-05-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集