five

Machine-learning model based on glycosyltransferase expression predicts multiple cancer types, subtypes and survival probability

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE254461
下载链接
链接失效反馈
官方服务:
资源简介:
Whereas current tools for diagnosing cancer focus on identifying markers in limited types of the disease, such as the PAM50 kit known primarily for subtyping breast cancer, our research reveals that 27 different cancers have a unique glycosyltransferase (GT) fingerprint that can be detected using a single test, with 93.7% accuracy in external validation. Four models were built on the expression patterns of 71 GTs that are universally influential in inducing the formation of cancer cells and giving them the impetus to metastasize. The first differentiates cancer tissue from normal tissue, the second distinguishes cancer types, and the third and fourth models differentiate between subtypes of cancer, in particular breast cancer and glioma. Data from the Cancer Genome Atlas database were used for deriving the models, while independent publicly available databases and newly collected patient samples were used for validation. Running these models against external databases showed just how powerful they are, producing results that were highly comparable to those achieved in internal testing. Furthermore, the breast cancer classifier differentiated between breast cancer subtypes with an average of 81% accuracy in external testing, far surpassing the industry-standard PAM50’s accuracy ratio of 43%. Meanwhile, a separate prognostic model designed to predict the probability of survival among glioma patients achieved high accuracy by zooming in on four GT genes that are strongly related to prognosis. Taken together, the models demonstrate the revolutionary potential of focusing on GT genes to simplify cancer diagnostics. To establish the reproducibility and verify the effectiveness of our models, we sought external validation using collected patient tissues. We isolated the mRNA from 57 fresh and frozen patient tissue samples, both with and without cancer across six different tissue types: breast tissues (23), colon/rectum tissues (10), kidney tissues (8), brain tissues (7), liver tissues (6), and uterine tissues (3). RNA-seq data of patient samples were used to test the clinical values of our models
创建时间:
2024-02-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作