MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation
收藏Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://data.mendeley.com/datasets/b9x7xxb9sz
下载链接
链接失效反馈官方服务:
资源简介:
MedNorm is a corpus of 27,979 textual descriptions simultaneously mapped to both MedDRA and SNOMED-CT, sourced from five publicly available datasets across biomedical and social media domains. The cross-terminology medical concept embeddings are 64-dimensional vectors for UMLS, MedDRA and SNOMED-CT concepts that are able to capture semantic similarities between concepts from different medical terminologies. For more details see paper entitled "MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation"
MedNorm是一个包含27979条文本描述的语料库,所有文本描述均可同时映射至MedDRA与SNOMED-CT两套医学术语体系,其数据来源于生物医学与社交媒体领域的5个公开数据集。针对统一医学语言系统(UMLS)、MedDRA及SNOMED-CT的医学概念所生成的跨术语医学概念嵌入均为64维向量,可有效捕捉不同医学术语体系下概念间的语义相似度。如需了解更多细节,请参阅题为《MedNorm:面向跨术语医学概念标准化的语料库与嵌入集》的论文。
创建时间:
2024-01-23



