Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromatin capture data
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP018601
下载链接
链接失效反馈官方服务:
资源简介:
Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes, however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated PacBio long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs. To improve the contiguities of these assemblies, we generated BioNano Genomics optical mapping and Dovetail Genomics chromatin capture data for genome scaffolding. To compare their performances, we developed a new assembly statistic called chromosome-N50 (CN50), which is a normalization of the commonly used N50 statistic for chromosome number. Despite their technical differences, optical mapping and chromatin capture performed similarly. Standard integration of both data types doubled CN50 values, however, after introducing modifications to both integration methods, such as automated misassembly breakage, assembly contiguity was close to chromosome-levels and was further increased once we combined the data of both technologies. Incontrast to most other assembly projects, we rigorously assessed the correctness of the contiguity of contigs and scaffolds of each assembly using Illumina mate-pair libraries and genetic map information. This revealed that PacBio assemblies have high sequence accuracy but can contain several misassemblies, which join unlinked regions of the genome. Most, but not all of these mis-joints were removed during the integration of the optical mapping and chromatin capture data. This result underlines the importance exercising caution about the correctnessof long-read-based contigs and how these can be improved using optical mapping and chromatin capture data.
创建时间:
2018-02-21



