A Nextflow-Based Automated Pipeline for Viral Assembly and Characterisation (EVEREST)

Name: A Nextflow-Based Automated Pipeline for Viral Assembly and Characterisation (EVEREST)
Creator: SUNScholarData
Published: 2025-03-07 11:17:27
License: 暂无描述

DataCite Commons2025-03-07 更新2025-04-16 收录

下载链接：

https://scholardata.sun.ac.za/articles/dataset/A_Nextflow-Based_Automated_Pipeline_for_Viral_Assembly_and_Characterisation_EVEREST_/28553732/1

下载链接

链接失效反馈

官方服务：

资源简介：

EVEREST (pipEline for Viral assEmbly and chaRactEriSaTion) is a comprehensive, end-to-end pipeline designed for virus discovery and characterization. Implemented in Nextflow, it processes Illumina single- and paired-end reads through five key phases: pre-processing, filtering, de novo assembly, refinement, and classification. The pipeline ensures high-quality data by trimming, removing host sequences, eliminating duplicates, and applying digital normalization. It then assembles viral genomes using a de novo assembly strategy, clusters similar contigs, captures viral genomes, and assesses their quality. Finally, EVEREST classifies viral contigs using the NCBI (nucleotide) and Uniprot (amino acid) databases, providing a robust framework for identifying and characterizing viruses from sequencing data.

EVEREST（病毒组装与表征流程，全称pipEline for Viral assEmbly and chaRactEriSaTion）是一款全面的端到端病毒发现与表征分析流程。该流程基于Nextflow框架实现，可处理Illumina平台的单端与双端测序读段（reads），涵盖五大核心阶段：预处理、过滤、从头组装（de novo assembly）、优化与分类。该流程通过序列剪切、去除宿主序列、剔除重复序列以及应用数字化归一化技术保障数据质量；随后采用从头组装策略构建病毒基因组，对相似重叠群（contigs）进行聚类，捕获病毒基因组并评估其质量。最终，EVEREST依托NCBI（核苷酸）数据库与UniProt（氨基酸）数据库对病毒重叠群进行分类，为从测序数据中识别与表征病毒提供了一套稳健的分析框架。

提供机构：

SUNScholarData

创建时间：

2025-03-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集