Comparison with published methods.

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/Comparison_with_published_methods_/28273372

下载链接

链接失效反馈

官方服务：

资源简介：

Hypertension is a critical risk factor and cause of mortality in cardiovascular diseases, and it remains a global public health issue. Therefore, understanding its mechanisms is essential for treating and preventing hypertension. Gene expression data is an important source for obtaining hypertension biomarkers. However, this data has a small sample size and high feature dimensionality, posing challenges to biomarker identification. We propose a novel deep graph clustering feature selection (DeepGCFS) algorithm to identify hypertension gene biomarkers with more biological significance. This algorithm utilizes a graph network to represent the interaction information between genes, builds a GNN model, designs a loss function based on link prediction and self-supervised learning ideas for training, and allows each gene node to obtain a feature vector representing global information. The algorithm then uses hybrid clustering methods for gene module detection. Finally, it combines integrated feature selection methods to determine the gene biomarkers. The experiment revealed that all the ten identified hypertension biomarkers were significantly differentiated, and it was found that the classification performance of AUC can reach 97.50%, which is better than other literature methods. Six genes (PTGS2, TBXA2R, ZNF101, KCNJ2, MSRA, and CMTM5) have been reported to be associated with hypertension. By using GSE113439 as the validation dataset, the AUC value of classification performance was to be 95.45%, and seven of the genes (LYSMD3, TBXA2R, KLC3, GPR171, PTGS2, MSRA, and CMTM5) were to be significantly different. In addition, this algorithm’s performance of gene feature vector clustering was better than other comparative methods. Therefore, the proposed algorithm has significant advantages in selecting potential hypertension biomarkers.

高血压（Hypertension）是心血管疾病的关键危险因素及致死病因，同时仍是全球性公共卫生难题。因此，阐明其发病机制对于高血压的防治至关重要。基因表达数据（Gene expression data）是获取高血压生物标志物的重要数据源，但此类数据存在样本量小、特征维度高的问题，给生物标志物识别带来了挑战。为此，我们提出了一种新型深度图聚类特征选择（DeepGCFS）算法，用于筛选更具生物学意义的高血压基因生物标志物。该算法利用图网络表征基因间的相互作用信息，构建图神经网络（GNN）模型，设计基于链路预测与自监督学习理念的损失函数开展训练，使每个基因节点能够获取表征全局信息的特征向量；随后采用混合聚类方法进行基因模块检测，最终结合集成特征选择方法确定目标基因生物标志物。实验结果显示，所筛选出的10种高血压生物标志物均存在显著表达差异，且分类性能的受试者工作特征曲线下面积（AUC）可达97.50%，优于现有文献报道的其他方法。其中PTGS2、TBXA2R、ZNF101、KCNJ2、MSRA及CMTM5这6种基因已有研究报道与高血压相关。以GSE113439作为验证数据集时，其分类性能的AUC值可达95.45%，其中LYSMD3、TBXA2R、KLC3、GPR171、PTGS2、MSRA及CMTM5这7种基因存在显著表达差异。此外，该算法在基因特征向量聚类任务中的性能也优于其他对比方法。综上，所提算法在筛选潜在高血压生物标志物方面具有显著优势。

创建时间：

2025-01-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集