FMDB, QSFMB and QQFMB databases
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10806680
下载链接
链接失效反馈官方服务:
资源简介:
DOI: 10.5281/zenodo.10806681
A catalogue of 10.5M proteins from bacterial species present in food microbiomes was constructed and used to refine and construct biome-specific databases of quorum sensing and quorum quenching related genes.
Methods
Identification of bacterial species present in food microbiomes
The FoodMicrobionet collates data from metataxonomic studies of over 10,000 food and environmental samples (Parente, Zotta & Ricciardi, 2022) and was used to select 4,507 taxa identified to species level (taxa_list.txt). Taxonkit (version 0.5.0) was used to infer taxonomy IDs (NCBI:txid) of 4,264 taxa (this could not be determined for 245 taxa, most frequently for groups of uncertain taxonomic standing and Candidatus species, "taxa_taxon.txt"). All available reference genomes (n=3,478) were downloaded from NCBI (January 2023) via the ncbi-datasets CLI, removing homotypic synonyms due to taxonomic revision where more than one species mapped to the same TaxID ("removed_synonyms.txt").
Generation of protein catalogue of Food Microbiomes (FMDB)
Prodigal (version 2.6.3) was used to predict the proteome of each reference genome, which was concatenated together. This pan-proteome was clustered at a sequence identity threshold of 0.9 using CD-HIT (version 4.7), yielding a total of 10,575,338 predicted proteins.
Quorum Sensing in Food Microbiomes database (QSFM)
Table D1 from the Quorum Sensing of Human Gut Microbiomes (QSHGM; Wu et al, 2022) was used as a template of validated quorum sensing-related genes and expanded by incorporating recently elucidated genes to a total of 240, including gene name; UniProtID; Type of Quorum sensing system (4 levels); whether it was a synthase or receptor. The amino acid sequence of each UniProtID was retrieved using the Unipressed client.,concatenated together and headers were adjusted using the AdjustFastaHeadersForShortBRED.py script. Diamond was used to query the previously-generated FMDB using the concatenated sequences ("qs_multi_unique.fa"). Clustering at 0.90 sequence identity threshold produced 2492 sequences, while at 0.50 sequence identity threshold CD-HIT produced a set of 932 proteins.
Quorum Quenching in Food Microbiomes database (QQFM)
Table S1 from a recent review by Sikdar & Elias (2020) was manually curated to retrieve, where available, UniprotIDs for validated quorum quenching proteins (n=82). Amino acid sequences were retrieved for each ID using the Unipressed client, concatenated, and headers were adjusted using the AdjustFastaHeadersForShortBRED.py script. Diamond was used to query the previously-generated FMDB using the concatenated sequences ("qq_multi_unique.fa"). Clustering at 0.90 sequence identity threshold produced 1268 sequences, while at 0.50 sequence identity threshold CD-HIT produced a set of 344 proteins.
References
Parente E, Zotta T, Ricciardi A. FoodMicrobionet v4: A large, integrated, open and transparent database for food bacterial communities. Int J Food Microbiol. 2022 Jul 2;372:109696. doi: 10.1016/j.ijfoodmicro.2022.109696. Epub 2022 May 2. PMID: 35526357.
Sikdar R, Elias M. Quorum quenching enzymes and their effects on virulence, biofilm, and microbiomes: a review of recent advances. Expert Rev Anti Infect Ther. 2020 Dec;18(12):1221-1233. doi: 10.1080/14787210.2020.1794815. Epub 2020 Aug 4. PMID: 32749905; PMCID: PMC7705441.
Wu S, Feng J, Liu C, Wu H, Qiu Z, Ge J, Sun S, Hong X, Li Y, Wang X, Yang A, Guo F, Qiao J. Machine learning aided construction of the quorum sensing communication network for human gut microbiota. Nat Commun. 2022 Jun 2;13(1):3079. doi: 10.1038/s41467-022-30741-6. PMID: 35654892; PMCID: PMC9163137.
创建时间:
2024-03-23



