Supporting data for "Vulture: Cloud-enabled scalable mining of microbial reads in public scRNA-seq data"
收藏DataCite Commons2025-05-26 更新2024-07-13 收录
下载链接:
http://gigadb.org/dataset/102473
下载链接
链接失效反馈官方服务:
资源简介:
The rapidly growing collection of public single-cell sequencing data has become a valuable resource for molecular, cellular, and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our benchmarking experiments, Vulture is 88-66% faster than local tools (PathogenTrack and Venus), and 41% faster than the state-of-the-art cloud-based tool Cumulus, while achieving comparable microbial read identification. In terms of the cost on cloud computing systems, Vulture also shows a cost reduction of 83% ($12 vs $70). We applied Vulture to two COVID-19, three hepatocellular carcinomas (HCC), and two gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell-type specific enrichment of SARS-CoV2, hepatitis B virus (HBV), and <i>Helicobacter pylori (H. pylori) positive</i> cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture.
提供机构:
GigaScience Database
创建时间:
2023-11-15



