five

GeneLLM: A Large cfRNA Language Model for Cancer Screening from Raw Reads

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP516128
下载链接
链接失效反馈
官方服务:
资源简介:
Cancer remains a significant global health challenge, necessitating advancements in early detection methods crucial to improving patient outcomes. Plasma cell-free RNA (cfRNA) has recently emerged as a promising biomarker for non-invasive early cancer detection and treatment monitoring. Traditional cancer detection methods, such as imaging and protein biomarkers from biopsies, often fail to capture the molecular and genetic intricacies of cancer, resulting in limitations in accuracy and specificity. Here, we developed and pretrained GeneLLM, a novel large language model based on cfRNA sequences. GeneLLM significantly advances the classification of various cancer types by directly interpreting cfRNA read patterns without relying on genome annotation. Our study demonstrates that this method achieves higher accuracy than traditional biomarkers and effectively handles large datasets from different centres, even with low sequencing depth. By avoiding the use of bioinformatics tools to count known genes, GeneLLM also discovered cfRNAs from previously unknown genes, referred to as "dark matters" in the genome, as cancer detection "pseudo-biomarkers". These findings suggest that GeneLLM has the potential to revolutionise cancer detection, making it more accessible and affordable, thus improving patient outcomes.
创建时间:
2026-03-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作