five

A Nextflow-Based Automated Pipeline for Viral Assembly and Characterisation (EVEREST)

收藏
DataCite Commons2025-03-07 更新2025-04-16 收录
下载链接:
https://scholardata.sun.ac.za/articles/dataset/A_Nextflow-Based_Automated_Pipeline_for_Viral_Assembly_and_Characterisation_EVEREST_/28553732/1
下载链接
链接失效反馈
官方服务:
资源简介:
EVEREST (pipEline for Viral assEmbly and chaRactEriSaTion) is a comprehensive, end-to-end pipeline designed for virus discovery and characterization. Implemented in Nextflow, it processes Illumina single- and paired-end reads through five key phases: pre-processing, filtering, de novo assembly, refinement, and classification. The pipeline ensures high-quality data by trimming, removing host sequences, eliminating duplicates, and applying digital normalization. It then assembles viral genomes using a de novo assembly strategy, clusters similar contigs, captures viral genomes, and assesses their quality. Finally, EVEREST classifies viral contigs using the NCBI (nucleotide) and Uniprot (amino acid) databases, providing a robust framework for identifying and characterizing viruses from sequencing data.

EVEREST(病毒组装与表征流程,全称pipEline for Viral assEmbly and chaRactEriSaTion)是一款全面的端到端病毒发现与表征分析流程。该流程基于Nextflow框架实现,可处理Illumina平台的单端与双端测序读段(reads),涵盖五大核心阶段:预处理、过滤、从头组装(de novo assembly)、优化与分类。该流程通过序列剪切、去除宿主序列、剔除重复序列以及应用数字化归一化技术保障数据质量;随后采用从头组装策略构建病毒基因组,对相似重叠群(contigs)进行聚类,捕获病毒基因组并评估其质量。最终,EVEREST依托NCBI(核苷酸)数据库与UniProt(氨基酸)数据库对病毒重叠群进行分类,为从测序数据中识别与表征病毒提供了一套稳健的分析框架。
提供机构:
SUNScholarData
创建时间:
2025-03-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作