five

A catalog of genes and species of the brown rat (Rattus norvegicus) gut microbiota

收藏
DataCite Commons2025-05-15 更新2025-04-16 收录
下载链接:
https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.57745/GVL2EE
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset overview We built a catalog of 5.9M genes found in the brown rat gut microbiota. Co-abundant genes were binned in 1627 Metagenomic Species for which we provide taxonomic labels. This dataset can be used to analyze shotgun sequencing data of the brown rat gut microbiota. Materials and Methods Data sources Rat fecal (and milk) samples characterized by shotgun metagenomic sequencing during the Mamiprooffi project. Sequencing data will be submitted soon on the European Nucleotide Archive (Bioproject PRJEB57230) The gene catalog of the Sprague-Dawley rat gut metagenome published by Pan et al. Metagenomic assembly Metagenomic assembly was performed on the Mamiprooffi samples (Data Source 1) with SPAdes (parameters: --iontorrent --careful). Contigs of less than 1500 bp or successfully aligned on the rat genome (Rnor_6.0) were removed. Non-redundant gene catalog Genes were predicted on all contigs with Prodigal (parameters : -m -p meta ). Genes with missing start codon or shorter than 99 bp were discarded. Then, partial and complete genes were separately clustered with cd-hit-est (parameters -c 0.95 -aS 0.90 -G 0 -d 0 -M 0 -T 0 ). Finally, these two non-redundant gene sets were merged with the previously published catalog (Data Source 2) using cd-hit-est-2d by considering at first complete genes (contact us for futher details). Functionnal annotation KEGG Orthologs (KOs) were assigned to genes of the final catalog with KofamScan (version 1.3.0, KEGG 107 database) Metagenomic Species Using the Meteor software suite, reads from samples in Bioprojects PRJEB57230 and PRJEB22973 were mapped against the final non redundant catalog to build a raw gene abundance table (5.9 million genes quantified in 370 samples). This table was submitted to MSPminer and Canopy. A total of 1627 clusters of co-abundant genes or MetaGenomic Species (MGS) were discovered. Quality control of each MGS was manually performed by visualizing heatmaps representative of the normalized gene abundance profiles. Taxonomic annotation of Metagenomic Species MGS taxonomic annotation was performed by aligning all core and accessory genes against the GTDB r214 representative genomes using blastn [4] (version 2.10.1, task = megablast, word_size = 16). The 20 best hits for each gene were kept. A species-level assignment was given if > 50% of the genes matched a GTDB representative genome with a mean identity ≥ 95% and mean gene length coverage ≥ 90%. The remaining MGS were assigned to a higher taxonomic levels (genus to superkingdom) if more than 50% of their genes had the same annotation.

数据集概览 本研究构建了包含590万个褐家鼠肠道菌群相关基因的基因目录。将共丰度基因聚类为1627个宏基因组物种(Metagenomic Species, MGS),并为其提供分类学注释。本数据集可用于褐家鼠肠道菌群的鸟枪测序(shotgun sequencing)数据分析。 材料与方法 数据来源 大鼠粪便(及乳汁)样本经Mamiprooffi项目期间的宏基因组鸟枪测序完成表征。测序数据即将提交至欧洲核苷酸档案馆(European Nucleotide Archive, ENA),生物项目编号为PRJEB57230;同时纳入Pan等人已发表的斯普拉格-道利大鼠肠道宏基因组基因目录。 宏基因组组装 对Mamiprooffi项目样本(数据来源1)采用SPAdes软件进行宏基因组组装,参数设置为--iontorrent --careful。移除长度小于1500 bp的重叠群(contig),以及可比对至大鼠基因组Rnor_6.0的序列。 非冗余基因目录构建 采用Prodigal软件对所有重叠群进行基因预测,参数设置为-m -p meta。丢弃缺少起始密码子或长度小于99 bp的基因。随后,将完整基因与部分基因分别使用cd-hit-est进行聚类,参数为-c 0.95 -aS 0.90 -G 0 -d 0 -M 0 -T 0。最后,将这两套非冗余基因集与此前已发表的目录(数据来源2)通过cd-hit-est-2d进行合并,优先处理完整基因(详细信息可联系作者获取)。 功能注释 采用KofamScan软件(版本1.3.0,匹配KEGG 107数据库)为最终基因目录中的基因注释KEGG直系同源簇(KEGG Orthologs, KOs)。 宏基因组物种聚类与质控 采用Meteor软件套件,将生物项目PRJEB57230与PRJEB22973的样本测序读数比对至最终非冗余基因目录,构建原始基因丰度表(共370个样本中定量了590万个基因)。将该丰度表提交至MSPminer与Canopy软件进行分析,最终共得到1627个共丰度基因簇,即宏基因组物种(MGS)。通过可视化标准化基因丰度谱热图,对每个MGS进行人工质量控制。 宏基因组物种分类学注释 将所有核心基因与附属基因与GTDB r214代表性基因组进行blastn比对(版本2.10.1,任务类型为megablast,单词长度word_size=16)。保留每个基因的前20个最优比对结果。若超过50%的基因比对至同一GTDB代表性基因组,且平均比对相似度≥95%、平均基因覆盖度≥90%,则为该MGS分配物种水平分类注释。对于剩余的MGS,若超过50%的基因具有相同的分类注释,则将其注释至更高分类层级(属至超界)。
提供机构:
Recherche Data Gouv
创建时间:
2023-02-20
二维码
社区交流群
二维码
科研交流群
商业服务