"Docmate Annotated Medical Conversations with keyword extraction (Medical NER dataset)"
收藏DataCite Commons2026-02-03 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/docmate-annotated-medical-conversations-keyword-extraction-medical-ner-dataset
下载链接
链接失效反馈官方服务:
资源简介:
"DocMate Medical NER Dataset: A Benchmark for Clinical Entity Recognition and Medical Conversation AnalysisThis dataset comprises 1,954 annotated clinical conversations designed for Named Entity Recognition (NER) in healthcare settings. It contains approximately 500 conversations annotated with four primary entity types: SYMPTOM, DIAGNOSIS, MEDICATION, and FAMILY_HISTORY. The dataset captures real-world medical dialogue between healthcare professionals and patients, covering diverse medical conditions including anxiety disorders, migraine, diabetes, hypertension, and sleep disorders. Each conversation is annotated using Gemini 2.5 flash with character-level entity boundaries and semantic labels, making it suitable for training and evaluating NLP models for clinical information extraction. The dataset has been structured in both raw text (TXT) and JSON annotation formats, supporting various machine learning frameworks including spaCy and other state-of-the-art NER architectures. This benchmark dataset aims to advance medical natural language processing and support development of clinical decision support systems."
提供机构:
IEEE DataPort
创建时间:
2026-02-03



