five

Plasmids Identified in Air Metagenomes

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11124656
下载链接
链接失效反馈
官方服务:
资源简介:
Metagenomic data were selected in Web of Science (Clarivate) on October 2022 using keywords: txid655179[Organism:noexp] AND metagenome [Filter]; AIR Metagenome; Air microbiome; Troposphere; Aerosol; Atmosphere. Data were manually curated to remove sequencing originated from metabarcoding data (i.e., 16S). The assembled data supplied by MetaSUB consortium (Danko et al., 2021) when available was used for air metagenome in the built environments.  Plasmid contents were predicted using the assembled data. Metagenomes sequencing by Illumina (paired-illumina reads) were assembled by using megahit 1.2.9 with metalarge option (Li et al., 2015) after cleaning data with bbduk2 (qtrim=rl trimq=28 minlen=25 maq=20 ktrim=r k=25 mink=11 and a list of adaptators to remove) from bbtools suite (https://jgi.doe.gov/data-and-tools/software-tools/bbtools/)   Plasmids were predicted for each assembling by using scripts describing in-depth in Hilpert et al. (Hilpert et al., 2021; Hennequin et al., 2022) and available in github website (https://github.com/meb-team/PlasSuite/). Briefly, contigs were analyzed using both reference-based and reference-free approaches. The databases employed included those for chromosomes (archaea and bacteria) and plasmids from NCBI, as well as the MOB-suite tool (Robertson and Nash, 2018) , SILVA (Quast et al., 2013) and phylogenetic markers harbored by chromosomes (Wu et al., 2013). Two reference-free methods were applied to contigs that were not affiliated with chromosomes (discarded) or plasmids (retained in the first step): PlasFlow (Krawczyk et al., 2018) and PlasClass (Pellow et al., 2020). Viruses were removed by using viralVerify (https://github.com/ablab/viralVerify) (Antipov et al., 2020) that provides in parallel provide plasmid/non-plasmid classification.  The database built for this purpose is available at this address https://github.com/meb-team/PlasSuite/?tab=readme-ov-file#1-prepare-or-download-your-databases  Eukaryotes contaminants were removed by aligning the sequences against NT databases and human chromosomes (GRCh38) with minimap2 with -x asm5 option (Li, 2018). Contigs mapping with an identity of 95% and a coverage of 80% were removed.  the final plasmidome set was clustered by mmseqs (Mirdita, Steinegger and Söding, 2019) with 80% of coverage and 90% of identity (--min-seq-id 0.90 -c 0.8 --cov-mode 1 --cluster-mode 2 --alignment-mode 3 --kmer-per-seq-scale 0.2).
创建时间:
2024-07-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作