five

Drugs, Diseases, Genes and Proteins in the CORD-19 Corpus

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6532472
下载链接
链接失效反馈
官方服务:
资源简介:
The BioNER+BioNEN system described in the paper "An Overview of Drugs, Diseases, Genes and Proteins in the CORD-19 Corpus", Badenes-Olmedo, Carlos et. al, (2022) was used to identify and normalize the drugs, diseases and genetic-related terms mentioned in the CORD-19 corpus (January 2022 Edition). Entity recognition and normalization was done for each paragraph of the scientific article. A first group of labels is created to identify the medical terms as they appear in the text (i.e. diseases_ss, chemicals_ss, genetics_ss), and in a standardized way (i.e. disease_terms_ss, chemical_terms_ss, genetic_terms_ss). In the case of diseases and genes/proteins, a predefined category is also established during the normalization process (i.e. disease_types_ss, genetic_types_ss ). The following group of labels contains the codes for each of the classification systems described in Section 3 (i.e. mesh_codes_ss, atc_codes_ss, cid_codes_ss, doid_codes_ss, cui_codes_ss, icd10_codes_ss, icd9_codes_ss, gard_codes_ss, snomed_codes_ss, nci_codes_ss, ncbi_codes_ss, uniprot_codes_ss). The suffix _ss in all tags indicates that the format is a textual list (i.e. string sequence).
创建时间:
2022-05-12
二维码
社区交流群
二维码
科研交流群
商业服务