Supporting data for "CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100729
下载链接
链接失效反馈官方服务:
资源简介:
Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig-level to generate high-quality chromosome-scale genomes from raw data remain scarce.<br>Chromosome Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g. Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the art reference genomes that have been built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig length using scaffolding, local re-assembly and gap-closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is capable of assembling ultra-long reads.<br>CSA can speed-up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects.
提供机构:
GigaScience Database
创建时间:
2020-03-23



