Domain-specific lexicons for enhancing language models through selective masking for thematic and misinformation classification in a One Health context
收藏DataCite Commons2025-11-12 更新2026-03-29 收录
下载链接:
https://dataverse.cirad.fr/citation?persistentId=doi:10.18167/DVN1/9UOOOF
下载链接
链接失效反馈官方服务:
资源简介:
This repository provides four domain-specific lexicons designed to enhance the performance of language models on classification tasks by selectively masking in a One Health context. The lexicons were compiled from specialized dictionaries and glossaries, and each addresses a specific domain from two application areas in the (i) One Health domain, covering the biomedical, phytosanitary, and syndromic surveillance fields, and (ii) epidemic misinformation. This repository includes the following resources: Biomedical Lexicon: Glossaries from Oxford Reference and RxList. Plant Health Lexicon: Glossaries from the British Society for Plant Pathology (BSPP) and the American Phytopathological Society (APS). Misinformation Lexicon: Vocabulary from Newcastle University, HelpfulProfessor, and Word Raiders. Syndromic Surveillance Lexicon: Glossaries from the University of Zurich and keywords from PADI-web developed in the Indian Ocean. Three of these (Biomedical Lexicon, Plant Health Lexicon and Misinformation Lexicon) are available under restricted access, while the Syndromic Surveillance Lexicon is publicly available for research purposes.
提供机构:
CIRAD Dataverse
创建时间:
2025-10-27



