Updated Metagenomic Species Pan-genomes (MSPs) of the human gastrointestinal microbiota
收藏DataCite Commons2025-04-16 更新2025-04-16 收录
下载链接:
https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.15454/FLANUP
下载链接
链接失效反馈官方服务:
资源简介:
Dataset overview This dataset provides: the updated Integrated Gene Catalog of the human gut microbiota (aka IGC2) 1,989 Metagenomic Species Pangenomes (MSPs) This dataset can be used to analyze shotgun sequencing data of the human gut microbiota. How to use this dataset To perform taxonomic, functionnal and strain level profiling with this dataset, we suggest using Meteor. Methods Gene catalog construction The methodology for creating the IGC2 catalog is described in the original papers: Li et al., 2014 and Wen et al., 2017 MSP creation Reads from publicly available human gut metagenomes were aligned against the IGC2 catalog with the Meteor to produce a raw gene abundance table (10.4M genes quantified in >2000 samples). Then, co-abundant genes were binned in 1,989 Metagenomic Species Pan-genomes (MSPs, i.e. clusters of co-abundant genes that likely belong to the same microbial species) using MSPminer. MSPs taxonomic annotation MSPs taxonomic annotation was performed by aligning MSP core and accessory genes against representative genomes of the Genome Taxonomy Database (GTDB r207) using blastn (task = megablast, word_size = 16). The 20 best hits for each gene were kept (--max-target-seq 20). Using an in-house pipeline, a species-level assignment was given if > 50% of the genes matched the representative genome of a given species, with a mean identity ≥ 95% and mean gene length coverage ≥ 90%. The remaining MSPs were assigned to a higher taxonomic level (genus to superkingdom), if more than 50% of their genes had the same annotation. Construction of the phylogenetic tree 39 universal phylogenetic markers genes were extracted from the MSPs with fetchMGs. Then, the markers were separately aligned with MUSCLE. The alignments were merged and trimmed with trimAl (parameters: -automated1). Finally, the phylogenetic tree was computed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).
数据集概览
本数据集包含更新版人类肠道微生物组整合基因目录(Integrated Gene Catalog of the human gut microbiota,简称IGC2)以及1989个宏基因组物种泛基因组(Metagenomic Species Pangenomes,简称MSPs)。本数据集可用于分析人类肠道微生物组的鸟枪测序(shotgun sequencing)数据。
数据集使用方法
若需使用本数据集开展分类学、功能学及菌株水平的特征分析,我们推荐使用Meteor工具。
研究方法
一、基因目录构建
IGC2目录的构建方法详见原始研究论文:Li等人2014年及Wen等人2017年的研究成果。
二、MSP构建
将公开可用的人类肠道宏基因组测序读段通过Meteor工具比对至IGC2目录,以生成原始基因丰度表(在超过2000份样本中定量了1040万个基因)。随后,利用MSPminer工具将共丰度基因聚类为1989个宏基因组物种泛基因组(MSPs,即大概率属于同一微生物物种的共丰度基因簇)。
三、MSP分类学注释
MSP的分类学注释流程如下:使用blastn工具(参数:task=megablast,word_size=16)将MSP的核心基因与附属基因比对至基因组分类数据库(Genome Taxonomy Database,GTDB r207)的代表基因组,并保留每个基因的前20个最佳比对结果(参数:--max-target-seq 20)。通过自研流程,若超过50%的基因与某一物种的代表基因组匹配,且平均序列一致性≥95%、平均基因长度覆盖度≥90%,则将该MSP注释至物种水平;对于剩余的MSP,若超过50%的基因具有一致的分类注释,则将其注释至更高分类层级(从属到超界)。
四、系统发育树构建
利用fetchMGs工具从MSPs中提取39个通用系统发育标记基因,随后使用MUSCLE工具对各标记基因分别进行多序列比对。将所有比对结果合并后,使用trimAl工具进行序列修剪(参数:-automated1)。最终,通过FastTreeMP工具计算系统发育树(参数:-gamma -pseudo -spr -mlacc 3 -slownni)。
提供机构:
Recherche Data Gouv
创建时间:
2021-04-06
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含人类肠道微生物群的更新版综合基因目录(IGC2)和1,989个宏基因组物种泛基因组(MSPs),适用于分析人类肠道微生物群的鸟枪法测序数据。数据集提供了详细的构建方法和使用指南,支持分类、功能和菌株水平的分析。
以上内容由遇见数据集搜集并总结生成



