Plasmids Identified in Air Metagenomes
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11124656
下载链接
链接失效反馈官方服务:
资源简介:
Metagenomic data were selected in Web of Science (Clarivate) on October 2022 using keywords: txid655179[Organism:noexp] AND metagenome [Filter]; AIR Metagenome; Air microbiome; Troposphere; Aerosol; Atmosphere. Data were manually curated to remove sequencing originated from metabarcoding data (i.e., 16S). The assembled data supplied by MetaSUB consortium (Danko et al., 2021) when available was used for air metagenome in the built environments.
Plasmid contents were predicted using the assembled data. Metagenomes sequencing by Illumina (paired-illumina reads) were assembled by using megahit 1.2.9 with metalarge option (Li et al., 2015) after cleaning data with bbduk2 (qtrim=rl trimq=28 minlen=25 maq=20 ktrim=r k=25 mink=11 and a list of adaptators to remove) from bbtools suite (https://jgi.doe.gov/data-and-tools/software-tools/bbtools/)
Plasmids were predicted for each assembling by using scripts describing in-depth in Hilpert et al. (Hilpert et al., 2021; Hennequin et al., 2022) and available in github website (https://github.com/meb-team/PlasSuite/). Briefly, contigs were analyzed using both reference-based and reference-free approaches. The databases employed included those for chromosomes (archaea and bacteria) and plasmids from NCBI, as well as the MOB-suite tool (Robertson and Nash, 2018) , SILVA (Quast et al., 2013) and phylogenetic markers harbored by chromosomes (Wu et al., 2013). Two reference-free methods were applied to contigs that were not affiliated with chromosomes (discarded) or plasmids (retained in the first step): PlasFlow (Krawczyk et al., 2018) and PlasClass (Pellow et al., 2020). Viruses were removed by using viralVerify (https://github.com/ablab/viralVerify) (Antipov et al., 2020) that provides in parallel provide plasmid/non-plasmid classification. The database built for this purpose is available at this address https://github.com/meb-team/PlasSuite/?tab=readme-ov-file#1-prepare-or-download-your-databases Eukaryotes contaminants were removed by aligning the sequences against NT databases and human chromosomes (GRCh38) with minimap2 with -x asm5 option (Li, 2018). Contigs mapping with an identity of 95% and a coverage of 80% were removed. the final plasmidome set was clustered by mmseqs (Mirdita, Steinegger and Söding, 2019) with 80% of coverage and 90% of identity (--min-seq-id 0.90 -c 0.8 --cov-mode 1 --cluster-mode 2 --alignment-mode 3 --kmer-per-seq-scale 0.2).
创建时间:
2024-07-17



