Auxillary Table S6
收藏DataCite Commons2021-05-07 更新2024-07-28 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Auxillary_Table_S6/14200175
下载链接
链接失效反馈官方服务:
资源简介:
A table containing statistics, accession numbers and other contextual data about the 12682 Metagenome Assembled Genomes of moderate and high-quality submitted to ENA (e.g. completeness estimated higher then 40% and redundancy lower then 5%). A table with the whole ~500.000 bins generated in this study can be found at the scilifelab-figshare doi:10.17044/scilifelab.13005311. Contains the fields: • bin_id: the unique identifier of a MAG • GC: GC-content of MAG • coding_density: coding-density estimate, e.g. sum of the length of all predicted amino-acid sequences times 3 divided by the length of the genome (in bases) • completeness: estimated completeness computed by CheckM • contamination: estimated redundancy computed by CheckM • strain_heterogeneity: computed by CheckM • length: length in bases of the MAG • nb_contigs: number of contigs in the MAG • nb_proteins: number of predicted proteins encoded in the bin (for the predicted amino-acid sequences in the easy-access repository) • gtdbtk_classification: classification by GTDBtk r89 • sourmash_classification: classification by sourmash using 2 databases, the GTDB (r89), and a database based on our GTDBtk classification • taxonomy: conclusion of the two previous • mOTU: membership to a metagenomic Operational Taxonomic Unit (mOTU), e.g. species level clustering of bins • path: relative path a tar-ball containing nucleotide, CDS- and amino-acid sequences, GFF, eggNOGmapper, and sourmash-signature files. Relative to https://export.uppmax.uu.se/uppstore2018116/stratfreshdb/ • accession: SRA/ENA sample ID of the assembly • genbank_accession: genbank accession number
本数据集为一张统计表,涵盖提交至欧洲核苷酸档案库(European Nucleotide Archive, ENA)的12682条中高品质宏基因组组装基因组(Metagenome Assembled Genomes, MAG)的统计数据、收录编号及其他关联信息,其筛选标准为估计完整度高于40%、冗余度低于5%。本研究中生成的全部约50万个宏基因组分箱(bins)对应的完整表格,可于scilifelab-figshare平台的doi:10.17044/scilifelab.13005311处获取。
该统计表包含以下字段:
• bin_id:MAG的唯一标识符
• GC:MAG的GC含量(GC-content)
• coding_density:编码密度估算值,计算公式为所有预测氨基酸序列的总长度乘以3后,除以基因组的碱基总长度
• completeness:由CheckM计算得到的估计完整度
• contamination:由CheckM计算得到的估计冗余度
• strain_heterogeneity:由CheckM计算得到的菌株异质性
• length:MAG的碱基总长度
• nb_contigs:MAG所包含的重叠群(contigs)数量
• nb_proteins:该分箱中预测得到的蛋白质编码基因数量(对应易获取仓库中的预测氨基酸序列)
• gtdbtk_classification:基于GTDBtk r89的分类结果
• sourmash_classification:sourmash基于两类数据库完成的分类结果,分别为GTDB(r89版本)以及基于本研究GTDBtk分类结果构建的数据库
• taxonomy:上述两种分类结果的综合结论
• mOTU:宏基因组操作分类单元(metagenomic Operational Taxonomic Unit, mOTU)归属,即分箱的物种级聚类结果
• path:包含核苷酸序列、编码序列(CDS)、氨基酸序列、通用特征格式(GFF)、eggNOGmapper注释结果及sourmash特征文件的tar打包文件相对路径,该路径的基准目录为https://export.uppmax.uu.se/uppstore2018116/stratfreshdb/
• accession:对应组装实验的序列读取档案(Sequence Read Archive, SRA)/ENA样本编号
• genbank_accession:GenBank收录编号
提供机构:
figshare
创建时间:
2021-05-07



