Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets
收藏DataCite Commons2025-05-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.r4xgxd29w
下载链接
链接失效反馈官方服务:
资源简介:
Objective: Normalizing mentions of medical concepts to standardized
vocabularies is a fundamental component of clinical text analysis.
Ambiguity—words or phrases that may refer to different concepts—has been
extensively researched as part of information extraction from biomedical
literature, but less is known about the types and frequency of ambiguity
in clinical text. This study characterizes the distribution and distinct
types of ambiguity exhibited by benchmark clinical concept normalization
datasets, in order to identify directions for advancing medical concept
normalization research. Materials and Methods: We identified ambiguous
strings in datasets derived from the two available clinical corpora for
concept normalization, and categorized the distinct types of ambiguity
they exhibited. We then compared observed string ambiguity in the datasets
to potential ambiguity in the Unified Medical Language System (UMLS), to
assess how representative available datasets are of ambiguity in clinical
language. Results: We observed twelve distinct types of ambiguity,
distributed unequally across the available datasets. However, less than
15% of the strings were ambiguous within the datasets, while over 50% were
ambiguous in the UMLS, indicating only partial coverage of clinical
ambiguity. Discussion: Existing datasets are not sufficient to cover the
diversity of clinical concept ambiguity, limiting both training and
evaluation of normalization methods for clinical text. Additionally, the
UMLS offers important semantic information for building and evaluating
normalization methods. Conclusion: Our findings identify three
opportunities for concept normalization research, including a need for
ambiguity-specific clinical datasets and leveraging the rich semantics of
the UMLS in new methods and evaluation measures for normalization.
提供机构:
Dryad
创建时间:
2020-11-28



