five

Metagenome-Assembled Genomes (MAGs) from cow rumen fluid samples

收藏
DataCite Commons2026-04-08 更新2026-05-04 收录
下载链接:
https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.57745/F9BMRL
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset description This dataset constitutes a collection of genomes from the cow rumen microbiome, constructed by integrating large-scale public metagenomic datasets with reference genomes from cultured isolates. Compared to the MGnify cow rumen catalogue, currently one of the most comprehensive public resources, this dataset expands the known diversity by identifying more than 2,100 additional species not present in MGnify, while sharing a core of ~1,700 species. Furthermore, for shared species, representative genomes from this catalogue exhibit higher quality in approximately 76% of cases, based on completeness, contamination, and assembly continuity metrics. These improvements enhance the reliability of downstream genome-resolved analyses. Data sources Metagenomic data Stewart et al. 2019 – BioProject PRJEB31266 (240 samples) Stewart et al. 2018 – BioProject PRJEB21624 (43 samples) Ruminomics – BioProject PRJEB21508 (58 samples) Mu et al. 2021 – BioProject PRJNA639405 (24 samples) Sato et al. 2024 – BioProject PRJDB16747 (37 samples) 7 unpublished deeply sequenced rumen metagenomes Total: 409 metagenomic samples Genomic data Hungate1000 – BioProject PRJNA471733 (381 genomes) Metagenomic assembly Metagenomic assemblies were generated using metaSPAdes. Contigs shorter than 1,500 bp were removed prior to downstream analyses. Genomic assembly Isolate genomes were assembled using SPAdes with parameters --isolate and --cov-cutoff auto. Contigs shorter than 1,500 bp were discarded. MAGs recovery Metagenome-assembled genomes (MAGs) were reconstructed using COMEBin (multi-coverage mode). Genome quality was assessed using CheckM2. MAGs were retained based on the following criteria: Completeness ≥ 70% Contamination ≤ 5% N50 ≥ 5 kb Genomes dereplication Pairwise Average Nucleotide Identity (ANI) was computed using skani. Genomes were dereplicated at the species level using a 95% ANI threshold. Taxonomic annotation Taxonomic classification of dereplicated genomes was performed using GTDB-Tk, based on GTDB release r220.
提供机构:
Recherche Data Gouv
创建时间:
2026-04-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作