five

A catalog of genes and species of the human skin microbiota

收藏
Recherche Data Gouv France2023-01-01 更新2026-04-09 收录
下载链接:
https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.57745/NY9OL5
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset overview This dataset provides: a non-redundant high-quality catalog of 2.9 million genes 392 Metagenomic Species Pangenomes (MSPs) This dataset can be used to analyze shotgun sequencing data of the human skin microbiota. How to use this dataset Create a gene abundance table by aligning reads from each sample against the catalog. For this purpose, you can use Meteor or NGLess. Then, normalize raw counts by gene length. Taxonomic profiling: the abundance of each species can be estimated as the average abundance of its 100 first core genes. To reduce the false positive rate, only consider that a species is present if at least 10/100 marker genes are detected. Methods Data sources This dataset was built using the following data sources: 118 isolate-derived genomes from the HMRGD 246 isolate-derived genomes from the Skin Microbial Genome Collection (SMGC) 1,407 skin metagenome assemblies from the Skin Microbial Genome Collection (SMGC) Non-redundant gene catalog After filtering out short contigs (<1500 bp), genes were predicted with Prodigal on genomes (mode: single) and metagenome assemblies (mode: meta). Complete genes (partial=00) were pooled and clustered with cd-hit-est (parameters -c 0.95 -aS 0.90 -G 0 -d 0 -M 0 -T 0) by choosing those from the longest contigs as representatives. Functional annotation KOs assignments were obtained with KofamScan using the KEGG 107 database. MSPs recovery Reads from the 1,120 skin metagenomes available in the bioproject PRJNA46333 were aligned against the non-redundant gene catalog with the Meteor software suite to produce a raw gene abundance table (2.9M genes quantified in 1,120samples). Then, co-abundant genes were binned in 392 Metagenomic Species Pan-genomes (MSPs, i.e. clusters of co-abundant genes that likely belong to the same microbial species) using MSPminer. MSPs taxonomic annotation Taxonomic annotation was performed by alignment of all core and accessory genes against representative genomes of the GTDB database (release r214) using blastn (version 2.7.1, task = megablast, word_size = 16). A species-level assignment was given if > 50% of the genes matched the representative genome of a given species, with a mean nucleotide identity ≥ 95% and mean gene length coverage ≥ 90%. The remaining MSPs were assigned to a higher taxonomic level (genus to superkingdom), if more than 50% of their genes had the same annotation. Construction of the phylogenetic tree 39 universal phylogenetic markers genes were extracted from the MSPs (or the corresponding genome if available) with fetchMGs. Then, the markers were separately aligned with MUSCLE. The 40 alignments were merged and trimmed with trimAl (parameters: -automated1). Finally, the phylogenetic tree was computed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).
创建时间:
2023-01-01
二维码
社区交流群
二维码
科研交流群
商业服务