Supplementary information for: NUMT PARSER: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera
收藏DataONE2022-12-06 更新2025-08-02 收录
下载链接:
https://search.dataone.org/view/sha256:5cf251166b3f81c8e584d2fadb4f66b29de89c3d26cd14ae2763c5e1f27ef5b4
下载链接
链接失效反馈官方服务:
资源简介:
Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from two ancient Cape lions (Panthera leo) because mtDNA is often the marker of choice for ancient DNA studies, and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numt..., Sequencing reads and alignments generated from ancient DNA of two Cape Lion (Panthera leo melanochaitus) samples. Raw reads were aligned to the Panthera leo mitochondrial reference (NCBI Accession KP202262.1) to obtain mitochondrial-specific reads. These mitochondrial reads were then processed using different methods (BLAST, SAMtools, Numt Parser) to identify and filter Numt-contaminant reads. See de Flamingh, et al. (2022) for additional information on the specific bioinformatic pipeline used and a description of the Numt Parser software., Files in BAM format (.bam) are stored in binary and require the use of SAMtools for conversion. SAM (.sam) and FASTA (.fa) files are in text format and can be accessed using any text editor software (in either the command line or a graphical application).
创建时间:
2025-07-14



