Simulated metagenomes with quality and abundance distributions derived from real samples

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://zenodo.org/record/2539431

下载链接

链接失效反馈

官方服务：

资源简介：

Species abundances and quality values were derived from the following list of samples: SAMEA2466896 SAMEA2466916 SAMEA2466952 SAMEA2466953 SAMEA2466965 SAMEA2466996 SAMEA2467015 SAMEA2467039 SAMEA2621010 SAMEA2621033 SAMEA2621107 SAMEA2621155 SAMEA2621229 SAMEA2621247 SAMEA2621300 SAMEA2622357 Reference abundances (.abund files) were generated using mOTUs profiler. Metagenomes were simulated with cMESSi using proGenomes' representative contigs for species and the aforementioned abundances. In cases where a ref_mOTU_v2 corresponded to more than one genome, the abundance of said ref_mOTU was distributed equally over all genomes. GFF location files were produced using location information generated by cMESSi. Two variants of truth values were obtained by intersecting coordinates of simulated reads with coordinates of eggNOG orthologous groups (OG at NOG level) as predicted by eggNOG-mapper. .cog-simulated files contain the NOG distribution that was effectively simulated, i.e. a count of the number of reads overlapping with genes annotated with each NOG. A read overlapping multiple genes is considered for each gene. If a gene possesses multiple NOG annotations, each annotation gets assigned the total number of overlapping reads. Longer genes will (in expectation) generate more reads, all else being equal. .cog-distribution file contains the expected distribution for every NOG on all samples. The number of genes annotated with each NOG is multiplied by the abundance of the corresponding species. Length of the gene is not taken into account. If you use this dataset, please cite: NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language

创建时间：

2020-01-24