five

Extracted microbial terms from Wikipedia - Marine Microbiology related pages (unfiltered)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12571309
下载链接
链接失效反馈
官方服务:
资源简介:
Marine microorganism related terms were extracted from various Wikipedia pages to scan the ODIS graph using the Wikipedia API library. The pages include topics such as "Marine microorganisms", "Marine microbiome", "Marine viruses", "Marine bacteria", "Bacterioplankton", "Bacterial motility", "Marine prokaryotes", "Marine archaea", "Marine protists", "Marine fungi", "Mycoplankton", "Marine microanimals", "Ichthyoplankton", "Marine primary production", "Algae", "Marine microplankton", "Marine microbenthos", "Sea ice microbial communities", "Hydrothermal vent microbial communities", "deep biosphere", and "microbial dark matter".    We used the english version of the spaCy library, a language processing (NLP) tool, to process text content extracted from Wikipedia and extract meaningful terms. We passed the combined text content to the spaCy model to generate a Doc object, which contains words and their associated linguistic information. After extracting these words from the Doc object, spaCy keeps track of their frequencies using a dictionary and filters out words that are not purely alphabetic, common stop words, and those shorter than four characters. Then, these words are each converted  to lowercase and sorted based on their frequency.
创建时间:
2024-07-04
二维码
社区交流群
二维码
科研交流群
商业服务