Data for: Bioinformatics analysis of the genes involved in the extension of prostate cancer to adjacent lymph nodes by supervised and unsupervised machine learning methods: the role of SPAG1 and PLEKHF2
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/fdb8f5hjyd
下载链接
链接失效反馈官方服务:
资源简介:
The present study aimed to identify the genes associated with the involvement of adjunct lymph nodes of patients with prostate cancer (PCa) and to provide valuable information for the identification of potential diagnostic biomarkers and pathological genes in PCa metastasis. The most important candidate genes were identified through several machine learning approaches including K-means clustering, neural network, Naïve Bayesian classifications and PCA with or without downsampling.
In total, 21 genes positively associated with lymph nodes involvement were identified. Among them, nine genes have been identified in metastatic prostate cancer, six have been found in other metastatic cancers and four in other local cancers. The amplification of the candidate genes was evaluated in the other PCa data sets. Besides, we identified a validated set of genes involved in the PCa metastasis. The amplification of SPAG1 and PLEKHF2 genes were associated with decreased survival in patients with PCa.
A TCGA dataset of Prostate Adenocarcinoma (TCGA, PanCancer Atlas) was retrieved from cBioPortal [7, 8]. RNA expression values had been standardized against the gene's expression distribution in a reference population and had been reported as log2 values. CNA data had been reported as +2 , +1, 0, -1 or -2. We initially performed the analyses on the RNA data and then used the CNA data for further validation. The samples had been assigned as either N1 or N0 groups (Figure 2). The N1 group included the samples from the patients with PCa with the involvement of lymph nodes whereas N0 group included the samples from the patients with PCa without the involvement of any lymph nodes. The NA samples were removed from the study.
本研究旨在明确与前列腺癌(prostate cancer, PCa)患者辅助淋巴结受累相关的基因,为前列腺癌转移中潜在诊断生物标志物及病理基因的筛选提供有价值的参考信息。研究通过多种机器学习方法——包括K-means聚类、神经网络、朴素贝叶斯分类以及有无下采样的主成分分析(PCA)——筛选得到关键候选基因。
最终共鉴定出21个与淋巴结受累呈正相关的基因。其中,9个基因已在转移性前列腺癌中被报道,6个在其他转移性癌症中被发现,另有4个见于其他局部癌症。本研究在其他前列腺癌数据集内对候选基因的扩增情况进行了评估;此外还鉴定出一组经验证的、参与前列腺癌转移的基因。SPAG1与PLEKHF2基因的扩增与前列腺癌患者的生存率降低显著相关。
本研究从cBioPortal数据库[7, 8]获取了前列腺腺癌的TCGA(The Cancer Genome Atlas,肿瘤基因图谱)数据集(TCGA, PanCancer Atlas)。RNA表达值已根据参考群体中的基因表达分布进行标准化,并以log₂值形式报告。拷贝数变异(Copy Number Alteration, CNA)数据以+2、+1、0、-1或-2进行标注。研究首先对RNA表达数据开展分析,随后使用CNA数据进行进一步验证。所有样本被分为N1组与N0组(见图2):N1组包含前列腺癌伴淋巴结受累患者的样本,N0组则包含无淋巴结受累的前列腺癌患者的样本。本研究剔除了存在缺失值(NA)的样本。
创建时间:
2020-07-04



