five

Data Sheet 1_Detection of viral contamination in cell lines using ViralCellDetector.pdf

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_Detection_of_viral_contamination_in_cell_lines_using_ViralCellDetector_pdf/29898914
下载链接
链接失效反馈
官方服务:
资源简介:
Background and aimsCell lines are widely used in biomedical research to investigate various biological processes, including gene expression, cancer progression, and drug responses. However, cross-contamination with bacteria, mycoplasma, and viruses remains a persistent challenge. While the detection of bacterial and mycoplasma contamination is relatively straightforward, identifying viral contamination is more difficult. To address this issue, we developed ViralCellDetector, a tool designed to detect viral contamination by mapping RNA-seq data to a comprehensive viral genome library. MethodsViralCellDetector processes RNA-seq data from any host species by first aligning reads to the host reference genome, followed by mapping the unmapped reads to the NCBI viral genome database. Viral presence is determined using stringent criteria based on the number of mapped reads and viral genome coverage. To further enable the detection of viral contamination from unknown sources, we identified host genes that are differentially expressed during viral infection and used these markers to train a machine learning model for classification. ResultsUsing ViralCellDetector, we found that approximately 10% (110 samples) of RNA-seq datasets involving MCF7 cells were likely contaminated with viruses. The tool demonstrated high sensitivity in detecting viral sequences. Furthermore, the machine learning model effectively distinguished infected from non-infected samples based on human gene expression profiles, achieving an AUC of 0.91 and an accuracy of 0.93. ConclusionOur mapping-based approach enables robust detection of viral contamination in RNA-seq data from any host organism, while the marker-based approach accurately identifies viral infections specifically in human cell lines. This capability can help researchers detect and avoid the use of contaminated cell lines, thereby improving the reliability of experimental outcomes.
创建时间:
2025-08-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作