five

Evaluation of recombination detection methods for viral sequencing

收藏
DataCite Commons2025-05-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.d7wm37q6f
下载链接
链接失效反馈
官方服务:
资源简介:
Recombination is a key evolutionary driver in shaping novel viral populations and lineages. When unaccounted for, recombination can impact evolutionary estimations, or complicate their interpretation. Therefore, identifying signals for recombination in sequencing data is a key prerequisite to further analyses. A repertoire of recombination detection methods have been developed over the past two decades, however, the prevalence of pandemic-scale viral sequencing data poses a computational challenge for existing methods. Here, we assessed five recombination detection methods (PhiPack (Profile), 3SEQ, GENECONV, VSEARCH (UCHIME), and gmos) to determine if any are suitable for the analysis of bulk sequencing data. To test the performance and scalability of these methods, we analysed simulated viral sequencing data across a range of sequence diversities, recombination frequencies, and sample sizes. Further, we provide a practical example for the analysis and validation of empirical data. We find that recombination detection methods need to be scalable, use an analytical approach and resolution that is suitable for the intended research application, and are accurate for the properties of a given dataset (e.g. sequence diversity and estimated recombination frequency). Analysis of simulated and empirical data revealed that the assessed methods exhibited considerable trade-offs between these criteria. Overall, we provide general guidelines for the validation of recombination detection results, the benefits and shortcomings of each assessed method, and future considerations for recombination detection methods for the assessment of large-scale viral sequencing data.
提供机构:
Dryad
创建时间:
2023-11-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作