Reconstruction of 1,979 prokaryotic metagenome-assembled genomes from 37 global cave environments
收藏Figshare2025-10-10 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Cave_Metagenome-assembled_Genome/29554673/4
下载链接
链接失效反馈官方服务:
资源简介:
This record contains a non-redundant catalog of cave metagenome-assembled genomes (MAGs) reconstructed from 37 caves with all companion tables needed for reuse. Representative genomes are given in two archives split by MIMAG quality; the full set of refined (pre-dereplication) MAGs is provided separately. Summary tables (Excel) cover sample metadata, genome quality, taxonomy, biosynthetic gene clusters (BGCs), antimicrobial-resistance genes (ARGs), functional annotations, and read-recruitment. Two HTML reports document the read quality before and after trimming.<br>Root level<b>metadata.xlsx</b> — Sample metadata and sequencing statistics. Each row is a metagenome sample with accessions and context.<code><strong>MAGs_HQ.tar</strong></code> and <code><strong>MAGs_MQ.tar</strong></code> — 1,979 species-level representative MAGs split by quality<code><strong>magscot.tar</strong></code> — 4,234 refined MAGs retained from 25,276 candidate bins after bin refinement across three binners.<i>HQ</i><i> (high quality): ≥90% completeness and ≤5% contamination.</i><i>MQ</i><i> (medium quality): ≥50% completeness and ≤10% contamination.</i>All MAG FASTA files are named: [RUN_ACCESSION]_cleanbin_[BIN_ID].faMAGs present in <code><strong>MAGs_HQ/MQ.tar</strong></code> are species-level representatives; MAGs found only in <code><strong>magscot.tar</strong></code> are non-representative refined MAGs.Unpacking archives.tar.zst: Decompress and then untar — Linux/macOS: tar -I unzstd -xf(or zstd -d&& tar -xf); Windows: open with 7-Zip (extract once to get the .tar, then extract the .tar to get the final folder).FoldersQCreport/Two interactive MultiQC HTML reports summarizing read quality:<code><strong>Report_BeforeQC.html</strong></code> — before trimming<code><strong>Report_AfterQC.html</strong></code> — after trimmingOpen directly in a browser to view adapter content, base-quality profiles, etc.Analysis/The Excel workbooks provide all annotations needed to reuse the catalog.Every table is keyed by MAG_ID, to match the FASTA filename stem in the archives ([RUN_ACCESSION]_cleanbin_[BIN_ID].fa; e.g., ERR10479404_cleanbin_000031), so columns using MAG_ID map directly to the corresponding genome files.<code><strong>CheckM2.xlsx</strong></code> — Genome quality for all 4,234 draft MAGs (located in magscot.tar)<code><strong>Classification.xlsx</strong></code> — GTDB-Tk taxonomy classification for the representatives MAGs<code><strong>BGC.xlsx</strong></code> — antiSMASH 8.0 biosynthetic gene clusters output for the representatives MAGs<code><strong>ARG.xlsx</strong></code> — DRAMMA ARG predictions for the representative<b> </b>MAGs<code><strong>DRAM.xlsx</strong></code> — DRAM functional annotation for the<b> </b>representative MAGs<code><strong>CoverM.xlsx</strong></code> — Read-recruitment of each sample against each representative MAGs<br>
提供机构:
Li, Huihong
创建时间:
2025-10-10



