HNSC filtered datasets from GSE139324 and GSE164690
收藏DataCite Commons2025-07-09 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/HNSC_filtered_datasets_from_GSE139324_and_GSE164690/29510081
下载链接
链接失效反馈官方服务:
资源简介:
The filtered HNSC datasets include GSE139324 and GSE164690, comprising scRNA-seq samples from healthy donors, as well as HPV-negative and HPV-positive cases.<br>This project uses deep learning models to improve clustering accuracy in large-scale single-cell atlases. By integrating <b>contrastive learning</b> and transformer based <b>variational autoencoders (VAEs)</b>, these models learn biologically meaningful low-dimensional representations, enabling precise identification of cell types, including rare and transitional populations.Key innovations include:Robust handling of batch effects and technical noiseScalable architecture for millions of cellsImproved clustering metrics (ARI, NMI) on benchmark datasetsEnhanced interpretability by linking latent features to gene programsThese models offer a powerful tool for building accurate, scalable cell atlases and advancing our understanding of cellular diversity.
经筛选的头颈部鳞状细胞癌(HNSC)数据集包含GSE139324与GSE164690,涵盖来自健康供体的单细胞RNA测序(single-cell RNA sequencing)样本,以及人乳头瘤病毒(Human Papillomavirus, HPV)阴性与阳性病例。本项目采用深度学习模型,以提升大规模单细胞图谱的聚类精度。通过融合对比学习(contrastive learning)与基于Transformer的变分自编码器(variational autoencoders, VAEs),这些模型可学习具有生物学意义的低维表征,从而精准识别细胞类型,包括稀有细胞群与过渡态细胞群。核心创新点包括:可稳健处理批次效应与技术噪声;支持百万级细胞的可扩展架构;在基准数据集上优化了聚类指标(调整兰德指数(Adjusted Rand Index, ARI)与归一化互信息(Normalized Mutual Information, NMI));通过将潜在特征与基因程序关联,提升了模型可解释性。这些模型可为构建精准且可扩展的细胞图谱,以及深化我们对细胞多样性的认知提供强有力的工具。
提供机构:
figshare
创建时间:
2025-07-09
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含来自GSE139324和GSE164690的过滤后的HNSC数据,涵盖健康捐赠者、HPV阴性和HPV阳性病例的单细胞RNA测序样本。数据集采用深度学习模型,结合对比学习和基于变压器的变分自编码器(VAEs),以提高大规模单细胞图谱的聚类准确性,并能够识别细胞类型,包括稀有和过渡群体。
以上内容由遇见数据集搜集并总结生成



