PkA1HT reference genome assembly (version 1.0)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13845106
下载链接
链接失效反馈官方服务:
资源简介:
PkA1HT reference genome assembly
Long-read PacBio HiFi sequencing was performed on two distinct P. knowlesi piggyBac clones (PkBc38 and PkBc44) to fully investigate potential structural variation (including gene duplication or deletion) amongst mutants that may have been overlooked by short-read sequencing. The piggyBac insertion was cut out of the PkBc38 clone assembly to generate a reference genome for our parental PkA1-H.1 line which we’ve named PkA1HT. The final PkA1HT reference genome was annotated with Companion (v2.2.8) using the P. knowlesi H strain as reference, specifying the assembly option with no contiguation. The new PkA1HT reference genome has 18 sequences (comprised of 14 chromosomes, the mitochondrial genome, the apicoplast genome, and two unordered contigs). It has no sequencing gaps and is 25.29MB long, compared to 142 gaps and a smaller 24.32Mb for PkA1-H.1. Sequences for ten chromosomes reach into the telomeric heptamer repeats on both ends, and sequences for the four remaining chromosomes reach into the telomeric repeats for just one end (the two unordered contigs contain the remaining two unassembled chromosome ends).
Additional methods
Long-read PacBio HiFi sequencing was performed on two distinct P. knowlesi piggyBac clones (PkBc38 and PkBc44) to fully investigate potential structural variation (including gene duplication or deletion) amongst mutants that may have been overlooked by short-read sequencing. High molecular weight (HMW) DNA was extracted from parasite-infected human RBCs using the MagAttract HMW DNA Kit (Qiagen #67563), following the manufacturer’s protocol for the manual purification of genomic DNA from whole blood. DNA quality and quantity were assessed using a Qubit fluorometer and agarose gel electrophoresis, with samples meeting the criteria of a Qubit concentration >50 ng/µL and intact bands on the gel. Two µg of HMW genomic DNA was then sheared to an average size of ~15-20 kb, and SMRTbell libraries were prepared using the PacBio SMRTbell Prep Kit 3.0 (PacBio #102-141-700), which included DNA end-repair, adapter ligation, and nuclease treatment to remove incomplete molecules. An additional gel-based size selection step on the PippinHT was performed to remove fragments smaller than 10 kb. The final library was purified, quantified, and sequenced on the PacBio Revio platform (1x Revio Cell) to generate long-read data. Samples were generated at the University of South Florida and sequenced at the Wellcome Sanger Institute.
Sequencing data were processed for quality control using standard PacBio workflows and a genome assembly was generated for each clone. We performed our long-read assemblies using the Canu assembler (v2.2) with default parameters, followed by polishing with ILRA (v1.5.1). The ILRA workflow included running ABACAS against the current PkA1-H.1 reference genome (v. 55) followed by two iterations of short-read correction using Pilon with Illumina reads of the parental line. We performed manual finishing with the Artemis Comparison Tool (ACT), informed by long reads mapped back against the draft assemblies. We obtained >500x coverage from our PacBio sequencing with a median read length of 15kbp, and we had telomere-to-telomere completion on several chromosomes.
Assemblies were then compared against each other using ACT. We found no significant structural variation between the two clonal lines, with near complete co-linearity save for each transposon insertion, indicating as expected that the piggyBac transposon insertion introduces no wider genomic changes. To compare differences between our parental line and the PkA1-H.1 reference, we used ACT to identify possible regions of recombination, followed by a mapping approach and manual analysis in Artemis for validation. We found a duplication of 14 genes on chromosome 7 in both clones. We further found several synteny breaks between our de novo assemblies and the current PkA1-H.1 reference. Analyzing those “breakpoints” more closely in ACT, we found that they are actually misassemblies in the current reference This finding of misassemblies motivated us to generate a more complete reference genome (see next section). We otherwise found no evidence of recombination or large indels vs. the reference for either clone. It should be noted that our PkA1HT assembly also supersedes both the PkH1 and PkA1 assemblies in terms of stats (details to be reported elsewhere).
创建时间:
2024-09-26



