Construction of a Transferable NGS Algorithmic Model for Predicting EBV-Associated Nasopharyngeal Cancer and High-risk Mutation
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP517996
下载链接
链接失效反馈官方服务:
资源简介:
Abstract Background: Epstein-Barr virus (EBV) infection is closely associated with the occurrence of nasopharyngeal carcinoma (NPC). The latent membrane protein 1 (LMP1) gene, known for its high heterogeneity, plays a crucial role in determining the oncogenic potential of NPC. This study aimed to develop a universal transferable algorithm for analyzing fragmented viral genome data using next generation sequencing (NGS), construct EBV associated NPC (EBVaNPC) prediction models, and investigate the functional significance of key mutation in LMP1. Method: EBV public whole genome sequencing data was collected and divided into a training and a test set in a 2:1 ratio. Using 26 clinical EBV-positive subjects, EBV LMP1 region (aa1-118) was sequenced with amplicon-based sequencing. An improved algorithm was developed to extract features and construct the EBVaNPC machine learning (ML) prediction modesl. The biological implications of predicted key mutation on tumor cell biological behaviors were investigated through qRT-PCR, EdU and transwell invasion assay, RNA-seq and gene ontology fingerprint (GOF) anslysis. Result: For randomly disrupted NGS data, different read length had minimal impact on the performance of models, with all six ML models achieving F1 scores above 0.8 on training dataset. On the test dataset, random forest (RF) and naive Bayes demonstrated superior performance using mutation and entropy features, respectively. The model was further validated on clinical cohorts to assess its transferability and generalizability to amplicon-based NGS data. Using differential features instead of all, data dimension was reduced while the model performance was improved. Interestingly, RF model revealed that H101R mutation in LMP1 emerged as the top significant feature, and its oncogenic implication was confirmed through proliferation and invasion experiments in HNE-1MUT-LMP1 cells. By integrating EBVaNPC GOF and RNA-seq data, the differentially expressed genes linked to the H101R mutation were involved primarily in immune regulation processes. Both approaches indicated a notable association between FOXP3-T cell anergy and WNT7A-stem cell population maintenance in HNE-1MUT-LMP1 cells. Conclusion: This study integrates algorithm design and experimental investigations to uncover the biological significance of EBV carcinogenesis, offering a clinically suitable technology for high risk NPC identification in EBV infected subjects. Overall design: To elucidate the biological implications of the LMP1 H101R mutation, we employed an in vitro model using the NPC cell line HNE-1 transfected with the mutant LMP1. The HNE-1 cells were divided into two groups: HNE-1MUT-LMP1 and HNE-1WT-LMP1 To explore the impact of the EBV LMP1 H101R mutation on host gene expression and biological functions, we further performed RNA sequencing on HNE-1MUT-LMP1 and HNE-1WT-LMP1 cells.
创建时间:
2025-12-04



