Elfsong/ClinicalDataset

Name: Elfsong/ClinicalDataset
Creator: Elfsong
Published: 2023-03-05 06:43:13
License: 暂无描述

Hugging Face2023-03-05 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Elfsong/ClinicalDataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - summarization - conversational language: - en pretty_name: MediQA size_categories: - 1K<n<10K --- # MEDIQA-Chat 2023 Training/Validation Data # Task A The training set consists of 1,201 pairs of conversations and associated section headers and contents. The validation set consists of 100 pairs of conversations and their summaries. The full list of normalized section headers: 1. fam/sochx [FAMILY HISTORY/SOCIAL HISTORY] 2. genhx [HISTORY of PRESENT ILLNESS] 3. pastmedicalhx [PAST MEDICAL HISTORY] 4. cc [CHIEF COMPLAINT] 5. pastsurgical [PAST SURGICAL HISTORY] 6. allergy 7. ros [REVIEW OF SYSTEMS] 8. medications 9. assessment 10. exam 11. diagnosis 12. disposition 13. plan 14. edcourse [EMERGENCY DEPARTMENT COURSE] 15. immunizations 16. imaging 17. gynhx [GYNECOLOGIC HISTORY] 18. procedures 19. other_history 20. labs # Task B The training set consists of 67 pairs of conversations and full notes. The validation set includes 20 pairs of conversations and clinical notes. Full encounter notes are expected to have at least one of four overall section divisions demarked by the first-occuring of its related section headers: > | note_division | section_headers > | subjective | chief complaint, history of present illness, hpi, subjective > | objective_exam | physical exam, exam > | objective_results | results, findings > | assessment_and_plan | assessment, plan Depending on the encounter, objective_exam and objective_results may not be relevant. We encourage review the sample data as well as the evaluation script to understand the best demarkation headers for your generated note. # Task C The training set consists of 67 pairs of full doctor-patient conversations and notes and the validation set includes 20 pairs of full conversations and clinical notes (same as Task-B datasets). The Task-A training and validation sets (1,301 pairs) could be used as additional training data.

提供机构：

Elfsong

原始信息汇总

数据集概述

数据集名称

名称: MediQA
别名: MEDIQA-Chat 2023 Training/Validation Data

数据集内容

任务类别:
- 总结
- 对话
语言: 英语
数据规模: 1K<n<10K

数据集详细内容

任务A
- 训练集: 包含1,201对对话及其相关的章节标题和内容。
- 验证集: 包含100对对话及其摘要。
- 章节标题列表: 共20个标准化章节标题，如“fam/sochx”代表“FAMILY HISTORY/SOCIAL HISTORY”。
任务B
- 训练集: 包含67对对话和完整笔记。
- 验证集: 包含20对对话和临床笔记。
- 笔记结构: 至少包含四个主要章节之一，如“subjective”、“objective_exam”、“objective_results”、“assessment_and_plan”。
任务C
- 训练集: 包含67对完整的医生-患者对话和笔记。
- 验证集: 包含20对完整的对话和临床笔记（与任务B相同）。
- 额外训练数据: 可使用任务A的训练和验证集（共1,301对）作为额外训练数据。

5,000+

优质数据集

54 个

任务类型

进入经典数据集