MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation

Mendeley Data2024-03-27 更新2024-06-27 收录

下载链接：

https://data.mendeley.com/datasets/b9x7xxb9sz

下载链接

链接失效反馈

官方服务：

资源简介：

MedNorm is a corpus of 27,979 textual descriptions simultaneously mapped to both MedDRA and SNOMED-CT, sourced from five publicly available datasets across biomedical and social media domains. The cross-terminology medical concept embeddings are 64-dimensional vectors for UMLS, MedDRA and SNOMED-CT concepts that are able to capture semantic similarities between concepts from different medical terminologies. For more details see paper entitled "MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation"

MedNorm是一个包含27979条文本描述的语料库，所有文本描述均可同时映射至MedDRA与SNOMED-CT两套医学术语体系，其数据来源于生物医学与社交媒体领域的5个公开数据集。针对统一医学语言系统（UMLS）、MedDRA及SNOMED-CT的医学概念所生成的跨术语医学概念嵌入均为64维向量，可有效捕捉不同医学术语体系下概念间的语义相似度。如需了解更多细节，请参阅题为《MedNorm：面向跨术语医学概念标准化的语料库与嵌入集》的论文。

创建时间：

2024-01-23