five

Comparison of Hi-C and VirMatcher predictions.

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Comparison_of_Hi-C_and_VirMatcher_predictions_/30669598
下载链接
链接失效反馈
官方服务:
资源简介:
Microbiomes influence diverse ecosystems, and viruses increasingly appear to impose key constraints. While viromics has expanded genomic catalogs, host identification for these viruses remains challenging due to the limitations in scaling cultivation-based approaches and the uncertain reliability and relative low resolution of in silico predictions – particularly for understudied viral taxa. Towards this, Hi-C proximity ligation uses sequenced, cross-linked virus and host genomic fragments to infer virus-host linkages and has now been applied in at least 10 studies. However, its accuracy remains unknown. Here we assess Hi-C performance in recovering virus-host interactions using synthetic communities (SynComs) composed of four marine bacterial strains and nine phages with known interactions and then apply optimized bioinformatic protocols to natural soil samples. In SynComs, standard Hi-C sample preparations and analyses showed poor normalized contact score performance (26% specificity, 100% sensitivity, incorrect matches up to class level) that could be dramatically improved by Z-score filtering (Z ≥ 0.5, 99% specificity), though at reduced sensitivity (62% down from 100%). Detection limits were established as reproducibility was poor below minimal phage abundances of 105 PFU/mL. Applying optimized bioinformatic protocols to natural soil samples, we compared virus-host linkages inferred from proximity-ligated Hi-C sequencing with predictions generated by in silico homology-based and machine learning-based bioinformatic approaches. Prior to Z-score thresholding, agreement was relatively high at the phylum to family levels (72%), but not at the genus (43%) or species (15%) levels. Z-score thresholding reduced sensitivity (only 34% of predictions were retained), with only modest improvements in congruence with bioinformatic methods (48% or 18% at genus or species levels, respectively). Regardless, this led to 79 genus-level-congruent virus-host linkages and 293 new ones revealed by Hi-C alone, i.e., providing many new virus-host interactions to explore in already well-studied climate-critical soils. Overall, these findings provide empirical benchmarks and methodological guidelines to improve the accuracy and reliability of Hi-C for virus-host linkage studies in complex microbial communities.

微生物组(microbiome)可影响多样化的生态系统,而病毒正愈发被证实会对其施加关键的调控约束。尽管病毒组学(viromics)已扩充了病毒基因组目录,但由于基于培养的宿主鉴定方法难以规模化,且计算机模拟(in silico)预测的可靠性存疑、分辨率相对较低——尤其是针对研究不足的病毒分类类群,这类病毒的宿主识别仍颇具挑战。针对这一难题,高通量染色体构象捕获(Hi-C)邻位连接技术通过对经交联处理的病毒与宿主基因组片段进行测序,以此推断病毒-宿主关联,目前该技术已至少应用于10项研究中。然而,其应用精度仍未明确。本研究利用由4株海洋细菌菌株与9株已知相互作用噬菌体构成的合成微生物群落(SynComs),评估Hi-C技术在还原病毒-宿主互作方面的表现;随后将优化后的生物信息学分析流程应用于天然土壤样本。在合成微生物群落中,标准Hi-C样本制备与分析流程的标准化接触得分表现不佳(特异性为26%、灵敏度为100%,错误匹配可达纲水平);通过Z值过滤(Z≥0.5)可显著提升性能,此时特异性可达99%,但灵敏度会从100%降至62%。研究发现当噬菌体丰度低于10^5 PFU/mL时实验重复性不佳,据此明确了该技术的检测限。将优化后的生物信息学流程应用于天然土壤样本后,我们将邻位连接Hi-C测序推断出的病毒-宿主关联,与基于计算机模拟同源性及机器学习的生物信息学方法所得到的预测结果进行了对比。在进行Z值阈值筛选前,病毒-宿主关联在门至科分类水平上的一致性相对较高(72%),但在属(43%)和种(15%)水平上一致性较差。经Z值阈值筛选后,灵敏度有所降低(仅保留了34%的预测结果),与生物信息学方法的契合度仅得到小幅提升(属水平为48%,种水平为18%)。即便如此,该筛选仍得到了79个与生物信息学方法契合的属水平病毒-宿主关联,以及293个仅由Hi-C技术揭示的全新关联——这为已被广泛研究的气候关键型土壤提供了诸多待探索的新型病毒-宿主互作关系。总体而言,本研究的发现为提升复杂微生物群落中病毒-宿主关联研究的Hi-C技术精度与可靠性,提供了实证基准与方法学指南。
创建时间:
2025-11-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作