Reconstruction of 1,979 prokaryotic metagenome-assembled genomes from 37 global cave environments
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Cave_Metagenome-assembled_Genome/29554673
下载链接
链接失效反馈官方服务:
资源简介:
This record contains a non-redundant catalog of cave metagenome-assembled genomes (MAGs) reconstructed from 37 caves with all companion tables needed for reuse. Representative genomes are given in two archives split by MIMAG quality; the full set of refined (pre-dereplication) MAGs is provided separately. Summary tables (Excel) cover sample metadata, genome quality, taxonomy, biosynthetic gene clusters (BGCs), antimicrobial-resistance genes (ARGs), functional annotations, and read-recruitment. Two HTML reports document the read quality before and after trimming.
Root level
metadata.xlsx — Sample metadata and sequencing statistics. Each row is a metagenome sample with accessions and context.MAGs_HQ.tar and MAGs_MQ.tar — 1,979 species-level representative MAGs split by qualitymagscot.tar — 4,234 refined MAGs retained from 25,276 candidate bins after bin refinement across three binners.HQ (high quality): ≥90% completeness and ≤5% contamination.MQ (medium quality): ≥50% completeness and ≤10% contamination.All MAG FASTA files are named: [RUN_ACCESSION]_cleanbin_[BIN_ID].faMAGs present in MAGs_HQ/MQ.tar are species-level representatives; MAGs found only in magscot.tar are non-representative refined MAGs.Unpacking archives.tar.zst: Decompress and then untar — Linux/macOS: tar -I unzstd -xf(or zstd -d&& tar -xf); Windows: open with 7-Zip (extract once to get the .tar, then extract the .tar to get the final folder).Folders
QCreport/
Two interactive MultiQC HTML reports summarizing read quality:Report_BeforeQC.html — before trimmingReport_AfterQC.html — after trimmingOpen directly in a browser to view adapter content, base-quality profiles, etc.Analysis/
The Excel workbooks provide all annotations needed to reuse the catalog.Every table is keyed by MAG_ID, to match the FASTA filename stem in the archives ([RUN_ACCESSION]_cleanbin_[BIN_ID].fa; e.g., ERR10479404_cleanbin_000031), so columns using MAG_ID map directly to the corresponding genome files.CheckM2.xlsx — Genome quality for all 4,234 draft MAGs (located in magscot.tar)Classification.xlsx — GTDB-Tk taxonomy classification for the representatives MAGsBGC.xlsx — antiSMASH 8.0 biosynthetic gene clusters output for the representatives MAGsARG.xlsx — DRAMMA ARG predictions for the representative MAGsDRAM.xlsx — DRAM functional annotation for the representative MAGsCoverM.xlsx — Read-recruitment of each sample against each representative MAGs
本数据集包含一套非冗余的洞穴宏基因组组装基因组(metagenome-assembled genomes, MAGs)目录,该数据集基于37处洞穴样本重构得到,并附带所有可复用所需的配套表格。代表性基因组按MIMAG质量等级划分为两个归档包;完整的精细化(去重前)MAGs集合则单独提供。汇总表格采用Excel格式,涵盖样本元数据、基因组质量、分类学信息、生物合成基因簇(biosynthetic gene clusters, BGCs)、抗菌耐药基因(antimicrobial-resistance genes, ARGs)、功能注释以及读段招募分析结果。另有两份HTML报告分别展示测序读段修剪前后的质量情况。
根目录
metadata.xlsx — 样本元数据与测序统计信息。每一行对应一个宏基因组样本,包含登录号与样本背景信息。
MAGs_HQ.tar 与 MAGs_MQ.tar — 按质量分级的1979个物种级代表性MAGs
magscot.tar — 经三种分箱工具完成分箱精细化处理后,从25276个候选分箱中保留的4234个精细化MAGs。其中HQ(高质量)标准为:基因组完整性≥90%且污染率≤5%;MQ(中等质量)标准为:基因组完整性≥50%且污染率≤10%。
所有MAG的FASTA文件命名格式为:"[RUN_ACCESSION]_cleanbin_[BIN_ID].fa"
出现在MAGs_HQ.tar与MAGs_MQ.tar中的MAG为物种级代表性基因组;仅存在于magscot.tar中的MAG为非代表性精细化MAGs。
归档包解压说明:.tar.zst格式归档包需先解压zst压缩层,再解压tar包——Linux/macOS系统可使用命令:tar -I unzstd -xf(或依次执行zstd -d && tar -xf);Windows系统可使用7-Zip解压(先提取得到.tar文件,再解压该tar文件以获取最终文件夹)。
Folders
QCreport/
包含两份交互式MultiQC HTML报告,用于汇总读段质量情况:
Report_BeforeQC.html — 修剪前的读段质量报告
Report_AfterQC.html — 修剪后的读段质量报告
可直接在浏览器中打开,查看接头序列、碱基质量分布等信息。
Analysis/
该文件夹下的Excel工作簿包含复用该目录所需的全部注释信息。所有表格均以MAG_ID作为主键,可与归档包中的FASTA文件名前缀匹配(命名格式为"[RUN_ACCESSION]_cleanbin_[BIN_ID].fa",示例:ERR10479404_cleanbin_000031),因此使用MAG_ID的列可直接映射至对应的基因组文件。
CheckM2.xlsx — 针对magscot.tar中全部4234个草图MAG的基因组质量统计
Classification.xlsx — 代表性MAGs的GTDB-Tk分类学注释结果
BGC.xlsx — 代表性MAGs的antiSMASH 8.0生物合成基因簇分析输出结果
ARG.xlsx — 代表性MAGs的DRAMMA抗菌耐药基因预测结果
DRAM.xlsx — 代表性MAGs的DRAM功能注释结果
CoverM.xlsx — 各样本针对所有代表性MAGs的读段招募情况
创建时间:
2025-07-13



