GeneLLM: A Large cfRNA Language Model for Cancer Screening from Raw Reads
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP516128
下载链接
链接失效反馈官方服务:
资源简介:
Cancer remains a significant global health challenge, necessitating advancements in early detection methods crucial to improving patient outcomes. Plasma cell-free RNA (cfRNA) has recently emerged as a promising biomarker for non-invasive early cancer detection and treatment monitoring. Traditional cancer detection methods, such as imaging and protein biomarkers from biopsies, often fail to capture the molecular and genetic intricacies of cancer, resulting in limitations in accuracy and specificity. Here, we developed and pretrained GeneLLM, a novel large language model based on cfRNA sequences. GeneLLM significantly advances the classification of various cancer types by directly interpreting cfRNA read patterns without relying on genome annotation. Our study demonstrates that this method achieves higher accuracy than traditional biomarkers and effectively handles large datasets from different centres, even with low sequencing depth. By avoiding the use of bioinformatics tools to count known genes, GeneLLM also discovered cfRNAs from previously unknown genes, referred to as "dark matters" in the genome, as cancer detection "pseudo-biomarkers". These findings suggest that GeneLLM has the potential to revolutionise cancer detection, making it more accessible and affordable, thus improving patient outcomes.
创建时间:
2026-03-01



