Metagenomics tools employed in microbiome research, 2018-2023, per peer-reviewed publications

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.31zcrjdw8

下载链接

链接失效反馈

官方服务：

资源简介：

Bioinformatics tools for processing metagenomic data embed choices about how to correlate DNA sequences with the presence of microbial taxa. Because no single correct way to make these choices has been or can currently be established, tools may embed different choices, and thus different assumptions about what constitutes valid evidence of a microorganism. We set out to document how those assumptions varied across the range of microbiome bioinformatics tools in current use. However, we were unable to do so because bioinformatics methods are inconsistently and incompletely documented in the peer-reviewed literature. Those omissions are important to how methodological choices can be accounted for in in interpreting results, and to the capacity for microbiome research to expand upon current understandings of how microorganisms exist. We advocate for more complete and transparent communication of bioinformatics choices in the published microbiome literature, for reasons concerning accessibility, education, data reusability, and standardization. Methods This dataset is a catalog of common tools in current use as of 2023. We searched PubMed and Google Scholar for “microbiome” + “tool name,” deriving an initial list of tool names from recent review articles and then adding tools mentioned in association with other tools until we reached saturation. We excluded tools that were not cited after 2020 (not in current use), that were not cited more than three times or by more than one research group (not common), and statistical tools that were not specific to microbiome sequencing analysis. We then characterized each tool in our catalog, organizing them into groups with similar functions. For each tool, we searched for “microbiome + [tool name]” as keywords in PubMed and Google Scholar, choosing 3-5 articles published between 2012-2022 that encompassed the diversity of topics or subdisciplines represented in the search results. We searched these publications for model system or application area; disciplinary orientation (generalizing from the publication, author affiliations, and system), research question, timeframe, and other tools used together with the tool that we were characterizing. We then sorted tools into categories on the basis of their data processing role. Because we observed category names being used inconsistently in the literature, we constructed and named categories that could be defined and distinguished on the basis of our observations (grounded coding, a common qualitative social science method). When categorizing a tool on the basis of our initially selected publications was difficult, we returned to our search results for additional detail, and we consulted any openly available documentation provided by the tool developer(s). After the initial data collection step, in which tools and their uses in specific research articles were documented, we began to classify tools into the categories listed in Table 1. To understand the function of a tool, we searched for the tool within each article. We recorded the input and output of a tool. If the data transformation performed by the tool was clear, we categorized the tool based on definitions included in Table 1, matching the tool to the category it most closely aligned with. For example, metagenomic sequencing data is inputted into MetaGeneMark, and predicted genes are outputted, so MetaGeneMark was classified as a protein prediction tool. While some tools shared a function across all documented research papers, others had more inconsistent uses. In the case that the classification of a tool was not clear based on its uses in the documented research papers, we consulted documentation provided by the original tool developers (often tool announcements) to determine the intended use of the tool. Many tools that appeared to fulfill the same purpose were described in different terms in different papers. For example, “binning” is used to describe both supervised and unsupervised taxonomy algorithms, encompassing steps that might be characterized more specifically as “taxonomic assignment/classification” or “clustering,” respectively. Common biology terms such as “gene,” “genome,” and “microbiome” itself routinely and unproblematically have multiple meanings as established knowledge shifts over time. However, in this case, context does not always clear up terminological ambiguity, and that ambiguity matters to authors’ abilities to readily recognize how inferences are being made about microbial identity in any given study. Consequently, we constructed and named categories that could be defined and distinguished on the basis of our observations (grounded coding, a common qualitative social science method). When categorizing a tool on the basis of our initially selected publications was difficult, we returned to our search results for additional detail, and we consulted any openly available documentation provided by the tool developer(s).

创建时间：

2024-10-07