用于中文电子病历命名实体识别任务。这只是一部分数据集内容。
收藏阿里云天池2026-06-03 更新2025-05-31 收录
下载链接:
https://tianchi.aliyun.com/dataset/205046
下载链接
链接失效反馈官方服务:
资源简介:
电子病例主要用来记录患者过往病史、所患疾病及症状表现、体征检查数据、诊疗意见及治疗效果等一系列与患者健康状况相关的重要信息。随着医疗行业的信息化建设推进,用于记录患者临床治疗信息的电子病例也逐步完善。基于电子病例的智能诊疗、患者画像构建及其病程追踪也逐渐成为智慧医疗领域的热点问题。为充分挖掘患者诊疗数据中的隐含特征和病症关联关系,高效准确的命名实体识别是电子病例文本信息抽取的关键。虽然电子病例的命名实体识别已有较丰富的研究成果,但是面向中文电子病例的相关研究相对较少。特别是,复杂的中文语言结构使中文电子病例文本存在专用词汇多、语言结构不规范、实体嵌套严重、中文词语边界模糊等特点,传统的命名实体识别模型难以获得满意的分类效果。
Electronic Medical Records (EMRs) are primarily utilized to document a comprehensive set of critical information related to patients' health status, including past medical histories, diagnosed diseases and clinical symptoms, physical examination data, diagnostic and therapeutic advice, and treatment outcomes. With the advancement of informatization construction in the healthcare industry, EMRs for recording patients' clinical treatment information have been gradually optimized and improved. Intelligent diagnosis and treatment, patient portrait construction, and disease course tracking based on EMRs have gradually emerged as hot research topics in the field of smart healthcare. To fully extract latent features and disease-disease association relationships from patients' diagnosis and treatment data, efficient and accurate Named Entity Recognition (NER) serves as the key step for information extraction from EMR texts. Although abundant research achievements have been made in NER for general EMRs, relevant studies targeting Chinese EMRs are relatively limited. Specifically, the complex structure of the Chinese language gives rise to multiple distinctive features of Chinese EMR texts, such as a high volume of domain-specific vocabulary, non-standard linguistic structures, severe entity nesting, and blurred word boundaries. Consequently, traditional NER models struggle to achieve satisfactory classification performance on such datasets.
提供机构:
阿里云天池
创建时间:
2025-05-26
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集专注于中文电子病历的命名实体识别任务,包含一个名为'med-ner.csv'的文件,大小为12.28KB,发布于2025年5月26日。数据集强调中文电子病历文本的挑战性特点,如专用词汇多、语言结构不规范和实体嵌套严重,旨在支持智慧医疗领域的智能诊疗和患者画像构建研究。
以上内容由遇见数据集搜集并总结生成



