Datasets for Lupo et al. (2022) An extended reservoir of class-D beta-lactamases in non-clinical bacterial strains

Name: Datasets for Lupo et al. (2022) An extended reservoir of class-D beta-lactamases in non-clinical bacterial strains
Creator: figshare
Published: 2022-02-15 13:58:03
License: 暂无描述

DataCite Commons2022-02-15 更新2024-07-29 收录

下载链接：

https://figshare.com/articles/dataset/Datasets_for_Lupo_et_al_2022_An_extended_reservoir_of_class-D_beta-lactamases_in_non-clinical_bacterial_strains/18544955/1

下载链接

链接失效反馈

官方服务：

资源简介：

Lupo et al. 2022: Archive content for v1Overview<pre><code>... 16 directories, 72 files </code></pre><code>README.md</code>: this file.<code>command-line.sh</code>: examples of bash commands to use or generate the files stored in this archive.biosampleThis directory contains input and output files used to assign a “clinical” score to a BioSample report. The list of BioSample accession numbers <code>biosample_id.lis</code> and the OBO (Open Biomedical Ontologies) <code>BiotopeBac_Dico.obo</code> files are both input to the <code>biosample2dico.pl</code> perl script (see <code>scripts</code> section). <code>biosample.dico</code> is the output file of <code>biosample2dico.pl</code> and the <code>word2score.tsv</code> file contains the unique list of standardized words. The <code>word2score.tsv</code> file was then manually edited to associate a subjective score (-1, 0, +1) to each standardized word.bldb_oxaFile in <code>.fasta</code> format of the reference OXA-family sequences from the Beta-lactamase Database (BLDB) used for annotation with the <code>annotate.pl</code> perl script from Bio::MUST modules.genetic_environmentThis directory contains the list of bacterial assembly download links in <code>.csv</code> format to provide to GeneSpy and the list of contig accession numbers to download with the command-line <code>efetch</code> tool from the NCBI E-utilities.local_refseq_dbThe list of the assembly accession numbers of the local RefSeq database built on 7th of December 2017.ncbi_pathogenThis directory contains consolidated FASTA (<code>.fasta</code>) and TSV (<code>.tab</code>) files downloaded from the NCBI Pathogen Detection server (ftp://ftp.ncbi.nlm.nih.gov/pathogen/):<code>all-prot-nr.fasta</code><code>all_bla.tab</code>It also contains files associated to class-D beta-lactamases:<code>class_d.fasta</code> are the class-D beta-lactamases extracted from <code>all-prot-nr.fasta</code> using the <code>get_accession_num.pl</code> perl script (see <code>scripts</code> section).<code>class_d-nr98.fasta</code> is the deduplicated file.<code>class_d98-align.fasta</code> is the alignment of the deduplicated file.<code>class_d98.hmm</code> is the HMM profile built from the alignment.<code>class_d98.hmms</code> is the raw results of <code>hmmsearch</code> on the local RefSeq database.oxa_familyThis directory contains the FASTA file <code>bla_d.fasta</code> with the 24,916 OXA-family protein selected with the <code>ompa-pa.pl</code> script and its deduplicated file <code>clst95_bla_d.fasta</code> and also the coordinates file <code>class_d98.bb</code> and the sequence accession identifier file <code>class_d98.idl</code> from <code>ompa-pa.pl</code>.alignmentsThree alignments of OXA-family proteins are available:<code>mafft</code> alignment: <code>align95_bla_d.fasta</code><code>ed</code>-optimized alignment: <code>classd-final-edit.fasta</code><code>net</code> reduced alignment: <code>classd-final-edit_188.fasta</code>The coordinates for the <code>net</code> column selection are in <code>classd-final-edit.bor</code> file.treeThe <code>mapper.idm</code> is a TSV file that contains the short and corresponding long sequence identifiers used to rename sequences for booster and RAxML tree.boosterThis directory contains raw output files obtained from the booster web server in NEWICK format. <code>boosterweb_tbe_norm.nh</code> is the final tree file.consenseConsensus tree computed with <code>consense</code> (PHYLIP package) using the 100 replicate trees of RAxML <code>RAxML_bootstrap.classd-final-edit_188-RAXML-PROTGAMMALGF-100xRAPIDBP</code>.raxmlThis directory contains raw output files of RAxML in NEWICK format, computed from the reduced alignment <code>classd-final-edit_188.fasta</code>.oxa_family_domainsThe 3510 unique OXA-family sequences and their corresponding taxonomy are available in FASTA format <code>3510_bla.fasta</code> and TSV format <code>3510_bla.tax</code>. When a unique sequence was found in several organisms, one organism was chosen randomly as the source. <code>gramN.fasta</code> and <code>gramP.fasta</code> correspond to the <code>3510_bla.fasta</code> split according to the SignalP ‘gram-’ and ‘gram+’ taxonomy. The <code>no-signal-3510_bla.fasta</code> correponds to the unique sequences without signal peptide. SignalP and TMHMM prediction files are in TSV format (<code>.signalp5</code> for SignalP and <code>.lis</code> for TMHMM).phylogenetic_clusteringThis directory contains a templatized R script <code>mcl.script.R.tt</code> used to compute phylogenetic clustering, the ladderized rooted OXA-family tree used by the R script and its associated traits file.scriptsThis directory contains various perl scripts:<code>annotate.pl</code>: assign an annotation to target sequences using BLAST similarity to reference sequences<code>biosample2dico.pl</code>: download BioSample reports in XML format and standardize all its words using an OBO (Open Biomedical Ontologies) input file<code>cut-signal-peptid.pl</code>: trim signal peptide from sequences in FASTA format based on the SignalP report<code>get_accession_num.pl</code>: extract different class of beta-lactamases from the NCBI Pathogen Detection server FASTA file<code>get_score.pl</code>: compute a ‘clinical score’ for each biosample according to its collection of standardized words<code>parse_consense_out.pl</code>: checks the monophyly of user-defined OTUs within a consense tree<code>parse_consense-parser.pl</code>: compute statistics from a <code>parse_consense_out.pl</code> outfilesql_dbThis directory contains the <code>SQL</code> files for the results database. The script for database creation is provided in <code>.sql</code> format, the SQLite database in <code>.sdb</code> format and the graphical interface for MySQL Workbench in <code>.mwb</code> format.taxdump-20180208Mirror of the NCBI Taxonomy used in this study (downloaded on 8th of February 2018).

提供机构：

figshare

创建时间：

2022-01-17