five

医疗领域命名实体识别数据集

收藏
魔搭社区2026-05-16 更新2024-12-07 收录
下载链接:
https://modelscope.cn/datasets/ShelterW/chinese_medical_ner
下载链接
链接失效反馈
官方服务:
资源简介:
数据集文件元信息以及数据文件,请浏览“数据集文件”页面获取。 当前数据集卡片使用的是默认模版,数据集的贡献者未提供更加详细的数据集介绍,但是您可以通过如下GIT Clone命令,或者ModelScope SDK来下载数据集 #### 下载方法 :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"} ### 医疗NER数据集 #### 1. [医渡云结构化4K数据集](https://tianchi.aliyun.com/dataset/144419) 或 [面向中文电子病历的命名实体识别数据集](https://tianchi.aliyun.com/dataset/92085) **实体类型**: - **疾病和诊断**:医学上定义的疾病以及医生在临床工作中对病因、病生理、分型分期等所作的判断。 - **检查**:影像检查(X线、CT、MR、PETCT等)+造影+超声+心电图。避免与手术操作的冲突,不包含胃镜、肠镜等其它诊断性操作。 - **检验**:实验室进行的物理或化学检查,特指临床工作中的检验科化验,不包括免疫组化等广义实验室检查。 - **手术**:医生在患者身体局部进行的切除、缝合等治疗,外科的主要治疗方法。 - **药物**:用于疾病治疗的具体化学物质。 - **解剖部位**:疾病、症状和体征发生的人体解剖学部位。 --- #### 2. [中药说明书实体识别](https://tianchi.aliyun.com/dataset/86819) - **数据集规模**:包含1000份训练集,标注了13类实体。 ![13类实体](中药说明书.png) --- #### 3. [中文糖尿病科研文献实体关系数据集DiaKG](https://tianchi.aliyun.com/dataset/88836) - **数据来源**:41篇中文糖尿病领域专家共识,涉及基础研究、临床研究、药物使用、临床病例、诊治方法等多个方面。 - **实体和关系**:共标注了22,050个医学实体和6,890对实体关系。 ![18类实体](糖尿病.png) --- #### 4. [中文医疗信息处理评测准CBLUE](https://tianchi.aliyun.com/dataset/95414) - **中文医学命名实体识别 (CMeEE)** - 数据集规模:15,000 / 5,000 / 3,000 - 实体类别:9类 ![9类实体](中文医学.png) - **智能对话诊疗数据集 (IMCS)** - 数据集规模:2,472 / 833 / 811 --- #### 5. [医药领域知识图谱](https://github.com/liuhuanyong/QASystemOnMedicalKG) - **数据特点**:以疾病为中心的医疗知识图谱,实体规模4.4万,实体关系规模30万。 | 实体类型 | 中文含义 | 实体数量 | 举例 | | ------------- | ------------- | -------- | ---- | | **Check** | 诊断检查项目 | 3,353 | 支气管造影;关节镜检查 | | **Department**| 医疗科目 | 54 | 整形美容科;烧伤科 | | **Disease** | 疾病 | 8,807 | 血栓闭塞性脉管炎;胸降主动脉动脉瘤 | | **Drug** | 药品 | 3,828 | 京万红痔疮膏;布林佐胺滴眼液 | | **Food** | 食物 | 4,870 | 番茄冲菜牛肉丸汤;竹笋炖羊肉 | | **Producer** | 在售药品 | 17,201 | 通药制药青霉素V钾片;青阳醋酸地塞米松片 | | **Symptom** | 疾病症状 | 5,998 | 乳腺组织肥厚;脑实质深部出血 | | **Total** | 总计 | 44,111 | 约4.4万实体量级 |

Dataset file metadata and data files can be obtained by browsing the "Dataset Files" page. This dataset card uses the default template, and the dataset contributors have not provided more detailed introductions. However, you can download the dataset via the following GIT Clone command or ModelScope SDK. #### Download Methods :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"} ### Medical NER Dataset #### 1. [Yidu Cloud Structured 4K Dataset](https://tianchi.aliyun.com/dataset/144419) or [Chinese Electronic Medical Record-oriented Named Entity Recognition Dataset](https://tianchi.aliyun.com/dataset/92085) **Entity Types**: - **Disease and Diagnosis**: Medically defined diseases, and judgments made by clinicians in clinical practice regarding etiology, pathophysiology, typing and staging, etc. - **Examination**: Imaging examinations (X-ray, CT, MR, PETCT, etc.) + angiography + ultrasound + electrocardiogram. To avoid conflicts with surgical operations, exclude other diagnostic procedures such as gastroscopy and colonoscopy. - **Laboratory Test**: Physical or chemical examinations conducted in laboratories, specifically referring to clinical laboratory tests, excluding generalized laboratory examinations such as immunohistochemistry. - **Surgery**: Treatments such as resection and suture performed by doctors on local parts of patients, which is the main therapeutic method in surgery. - **Drug**: Specific chemical substances used for disease treatment. - **Anatomical Location**: Anatomical parts of the human body where diseases, symptoms and signs occur. --- #### 2. [Traditional Chinese Medicine (TCM) Package Insert Entity Recognition](https://tianchi.aliyun.com/dataset/86819) - **Dataset Scale**: Contains 1000 training sets with 13 annotated entity categories. ![13 Entity Categories](中药说明书.png) --- #### 3. [Chinese Diabetes Research Literature Entity Relationship Dataset DiaKG](https://tianchi.aliyun.com/dataset/88836) - **Data Source**: 41 Chinese expert consensuses in the field of diabetes, covering basic research, clinical research, drug use, clinical cases, diagnosis and treatment methods and other aspects. - **Entities and Relationships**: A total of 22,050 medical entities and 6,890 entity relationship pairs are annotated. ![18 Entity Categories](糖尿病.png) --- #### 4. [Chinese Medical Information Processing Evaluation Benchmark CBLUE](https://tianchi.aliyun.com/dataset/95414) - **Chinese Medical Named Entity Recognition (CMeEE)** - **Dataset Scale**: 15,000 / 5,000 / 3,000 - **Entity Categories**: 9 types ![9 Entity Categories](中文医学.png) - **Intelligent Dialogue Diagnosis and Treatment Dataset (IMCS)** - **Dataset Scale**: 2,472 / 833 / 811 --- #### 5. [Medical Domain Knowledge Graph](https://github.com/liuhuanyong/QASystemOnMedicalKG) - **Data Characteristics**: A disease-centric medical knowledge graph, with 44,000 entities and 300,000 entity relationships. | Entity Type | Chinese Meaning | Entity Count | Example | | ------------- | ------------- | -------- | ---- | | **Check** | Diagnostic Examination Item | 3,353 | Bronchography; Arthroscopy | | **Department**| Medical Department | 54 | Plastic and Cosmetic Surgery; Burn Department | | **Disease** | Disease | 8,807 | Thromboangiitis Obliterans; Descending Thoracic Aortic Aneurysm | | **Drug** | Pharmaceutical Product | 3,828 | Jingwanhong Hemorrhoid Ointment; Brinzolamide Ophthalmic Suspension | | **Food** | Food | 4,870 | Tomato and Mustard Beef Ball Soup; Braised Bamboo Shoots and Lamb | | **Producer** | Marketed Pharmaceutical Product | 17,201 | Tongyao Pharmaceutical Penicillin V Potassium Tablets; Qingyi Acetate Dexamethasone Tablets | | **Symptom** | Disease Symptom | 5,998 | Breast Tissue Hypertrophy; Deep Cerebral Parenchymal Hemorrhage | | **Total** | Total | 44,111 | Approximately 44,000 entities |
提供机构:
maas
创建时间:
2024-12-06
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个综合性的中文医疗命名实体识别数据集,整合了多个来源的子数据集,如医渡云结构化数据、中药说明书、糖尿病科研文献等,覆盖疾病、检查、药物等多种实体类型,规模较大,适用于医疗文本的实体识别和关系抽取任务。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务