five

Benchmarking Illumina-only and Illumina-PacBio Hybrid Assemblies with the Arabidopsis thaliana Reference Genome

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP004874
下载链接
链接失效反馈
官方服务:
资源简介:
Assembling short reads into high quality genome sequences remains a challenge. The choice of an optimal combination of assembler and sequencing data is crucial, as these differ greatly in performance and final output. It also remains largely unclear how to best compare and assess the quality of genome assemblies. Here, we benchmark different strategies using the high-quality Columbia (Col-0) reference genome of the model plant Arabidopsis thaliana. In the reference, the ten chromosome arms comprise individual contigs of together about 120 Mb, separated by non-assembled centromeres of about 30 Mb. We generated Illumina paired-end plus medium-size (7 kb) and long (40 kb) jumping mate-pair reads as well as 1.5 kb PacBio reads from the A. thaliana reference strain Col-0. We applied five different assemblers to produce scaffolds from different combinations of Illumina reads. We attempted to close remaining gaps with PacBio reads. Contiguity- and reference-based criteria were used to assess the quality of each assembly strategy. Despite their high per-base error rates, PacBio reads substantially improved assemblies by filling scaffold gaps. We also show how features such as GC-content, repetitiveness, and sequencing depth-of-coverage affect local assembly quality. Using objective criteria, we provide insights into cost-efficient assembly strategies for medium-size genomes. ALLPATHS-LG benefitted most from long distance mate-pair reads and is recommended for the assembly of A. thaliana size genomes from short reads.
创建时间:
2022-02-26
二维码
社区交流群
二维码
科研交流群
商业服务