Supplementary information for Salazar et al. (2020): mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/4352762

下载链接

链接失效反馈

官方服务：

资源简介：

Supplementary data and code in Salazar et al. (2020). mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes. The code used to reproduced the analyses can be found in the code folder: All R scripts reproducing the external benchmarking based on data released by Almeida et al. (2018) are in ./code/almeida_stats. The scripts are numbered sequentially according to their order of use. Scripts computing the metrics used to evaluate the classification and profiling performance based on the internal benchmarking are found in ./code/internal_benchmarking. The scripts are numbered sequentially according to their order of use. A single script producing the figures used in the publication is found in ./code/plots_pub.R. Supplementary data can be found in the data folder: The metrics used to evaluate the classification and profiling performance based on both the internal and external benchmarking: ./data/processed. The metrics used in the internal benchmarking are found in ./data/processed/internal_benchmarking/all_stats_long.tsv as a tab-delimited file. The metrics used in the external benchmarking are found in (as a tab-delimited file): Metrics based on Almeida et al. (2018) data: ./data/processed/almeida_stats/stats_almeida.tsv. Metrics based on mTAGs (computed from profiles): /data/processed/almeida_stats/stats_mtags.tsv. Metrics based on mTAGs (computed from bins): ./data/processed/almeida_stats/stats_mtags_from bins.tsv. The reference databases derived from the SILVA SSU database (versions 128 and 138): ./data/DB_GEN_128 and ./data/raw/DB_GEN_138. For both of them a sequence file (*.fasta), a file containing the cluster members (*.clstr) and a file containing the taxonomic annotation (*taxmap) are provided. The taxonomic profiles based on mTAGs for the external benchmarking dataset are found in ./data/raw/mtags_almeida. The format is the output format of the mTAGs tool (https://github.com/SushiLab/mTAGs). Data is provided for both databases (*cons: using the degenerate consensus sequence; *repr: using the longest member). The simulated dataset used for the internal benchmarking is found in ./data/raw/silva_138_simulate_analysis Simulated reads of 100, 150 and 250 bp from the SILVA SSU database 138 are found as pairs of FASTA files. The true annotation and the annotation predicted by mTAGs of the simulated reads is found in ./data/raw/silva_138_simulate_analysis/annotation based on both databases (cons: using the degenerate consensus sequence; repr: using the longest member).

创建时间：

2021-05-12