five

Predicting plasmid contigs from assemblies using single copy marker genes, plasmid genes, kmers

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3968421
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction: Antimicrobial resistant (AMR) genes in bacteria are often carried on plasmids. Since these plasmids can spread the AMR genes between bacteria, it is important to know if the genes are located on highly transferable plasmids or in the more stable chromosomes. Whole genome sequence (WGS) analysis makes it easy to determine if a strain contains a resistance gene, however, it is not easy to determine if the gene is located on the chromosome or on a plasmid as genome sequence assembly generally results in 50-300 DNA fragments (contigs). With our newly developed prediction tool, we analyze the composition of these contigs to predict their likely source, plasmid or chromosomal. This information can be used to determine if a resistant gene is chromosomally located or on a plasmid. The tool is optimized for 19 different bacterial species, including Campylobacter, E. coli, and Salmonella, and can also be used for metagenomic assemblies. Methods: The tool identifies the number of chromosomal marker genes, plasmid replication genes and plasmid typing genes using CheckM and DIAMOND Blast, and determines pentamer frequencies and contig sizes per contig. A prediction model was trained using Random Forest on an extensive set of plasmids and chromosomes from 19 different bacterial species and validated on separate test sets of known chromosomal and plasmid contigs of the different bacteria. Results: Prediction of plasmid contigs was nearly perfect when calculated based on number of correctly predicted bases, with up to 99% specificity and 99% sensitivity. Prediction of small contigs remains difficult, since these contigs consists primarily of repeated sequences present in both plasmid and chromosome, e.g. transposases. Conclusion: The newly developed tool is able to determine if contigs are chromosomal or plasmid with a very high specificity and sensitivity (up to 99%) and can be very useful to analyze WGS data of bacterial genomes and their antimicrobial resistance genes. Plasmid databases can be downloaded from: http://klif.uu.nl/download/plasmid_db/ Data used for training can be downloaded here: http://klif.uu.nl/download/plasmid_db/trainingsets2/
创建时间:
2020-07-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作