five

Auxillary Table S6

收藏
DataCite Commons2021-05-07 更新2024-07-28 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Auxillary_Table_S6/14200175
下载链接
链接失效反馈
官方服务:
资源简介:
A table containing statistics, accession numbers and other contextual data about the 12682 Metagenome Assembled Genomes of moderate and high-quality submitted to ENA (e.g. completeness estimated higher then 40% and redundancy lower then 5%). A table with the whole ~500.000 bins generated in this study can be found at the scilifelab-figshare doi:10.17044/scilifelab.13005311. Contains the fields: • bin_id: the unique identifier of a MAG • GC: GC-content of MAG • coding_density: coding-density estimate, e.g. sum of the length of all predicted amino-acid sequences times 3 divided by the length of the genome (in bases) • completeness: estimated completeness computed by CheckM • contamination: estimated redundancy computed by CheckM • strain_heterogeneity: computed by CheckM • length: length in bases of the MAG • nb_contigs: number of contigs in the MAG • nb_proteins: number of predicted proteins encoded in the bin (for the predicted amino-acid sequences in the easy-access repository) • gtdbtk_classification: classification by GTDBtk r89 • sourmash_classification: classification by sourmash using 2 databases, the GTDB (r89), and a database based on our GTDBtk classification • taxonomy: conclusion of the two previous • mOTU: membership to a metagenomic Operational Taxonomic Unit (mOTU), e.g. species level clustering of bins • path: relative path a tar-ball containing nucleotide, CDS- and amino-acid sequences, GFF, eggNOGmapper, and sourmash-signature files. Relative to https://export.uppmax.uu.se/uppstore2018116/stratfreshdb/ • accession: SRA/ENA sample ID of the assembly • genbank_accession: genbank accession number

本数据集为一张统计表,涵盖提交至欧洲核苷酸档案库(European Nucleotide Archive, ENA)的12682条中高品质宏基因组组装基因组(Metagenome Assembled Genomes, MAG)的统计数据、收录编号及其他关联信息,其筛选标准为估计完整度高于40%、冗余度低于5%。本研究中生成的全部约50万个宏基因组分箱(bins)对应的完整表格,可于scilifelab-figshare平台的doi:10.17044/scilifelab.13005311处获取。 该统计表包含以下字段: • bin_id:MAG的唯一标识符 • GC:MAG的GC含量(GC-content) • coding_density:编码密度估算值,计算公式为所有预测氨基酸序列的总长度乘以3后,除以基因组的碱基总长度 • completeness:由CheckM计算得到的估计完整度 • contamination:由CheckM计算得到的估计冗余度 • strain_heterogeneity:由CheckM计算得到的菌株异质性 • length:MAG的碱基总长度 • nb_contigs:MAG所包含的重叠群(contigs)数量 • nb_proteins:该分箱中预测得到的蛋白质编码基因数量(对应易获取仓库中的预测氨基酸序列) • gtdbtk_classification:基于GTDBtk r89的分类结果 • sourmash_classification:sourmash基于两类数据库完成的分类结果,分别为GTDB(r89版本)以及基于本研究GTDBtk分类结果构建的数据库 • taxonomy:上述两种分类结果的综合结论 • mOTU:宏基因组操作分类单元(metagenomic Operational Taxonomic Unit, mOTU)归属,即分箱的物种级聚类结果 • path:包含核苷酸序列、编码序列(CDS)、氨基酸序列、通用特征格式(GFF)、eggNOGmapper注释结果及sourmash特征文件的tar打包文件相对路径,该路径的基准目录为https://export.uppmax.uu.se/uppstore2018116/stratfreshdb/ • accession:对应组装实验的序列读取档案(Sequence Read Archive, SRA)/ENA样本编号 • genbank_accession:GenBank收录编号
提供机构:
figshare
创建时间:
2021-05-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作