Extracted microbial terms from Wikipedia - Marine Microbiology related pages (unfiltered)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12571309
下载链接
链接失效反馈官方服务:
资源简介:
Marine microorganism related terms were extracted from various Wikipedia pages to scan the ODIS graph using the Wikipedia API library. The pages include topics such as "Marine microorganisms", "Marine microbiome", "Marine viruses", "Marine bacteria", "Bacterioplankton", "Bacterial motility", "Marine prokaryotes", "Marine archaea", "Marine protists", "Marine fungi", "Mycoplankton", "Marine microanimals", "Ichthyoplankton", "Marine primary production", "Algae", "Marine microplankton", "Marine microbenthos", "Sea ice microbial communities", "Hydrothermal vent microbial communities", "deep biosphere", and "microbial dark matter".
We used the english version of the spaCy library, a language processing (NLP) tool, to process text content extracted from Wikipedia and extract meaningful terms. We passed the combined text content to the spaCy model to generate a Doc object, which contains words and their associated linguistic information. After extracting these words from the Doc object, spaCy keeps track of their frequencies using a dictionary and filters out words that are not purely alphabetic, common stop words, and those shorter than four characters. Then, these words are each converted to lowercase and sorted based on their frequency.
创建时间:
2024-07-04



