Elfsong/ClinicalDataset
收藏Hugging Face2023-03-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Elfsong/ClinicalDataset
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- summarization
- conversational
language:
- en
pretty_name: MediQA
size_categories:
- 1K<n<10K
---
# MEDIQA-Chat 2023 Training/Validation Data
# Task A
The training set consists of 1,201 pairs of conversations and associated section headers and contents.
The validation set consists of 100 pairs of conversations and their summaries.
The full list of normalized section headers:
1. fam/sochx [FAMILY HISTORY/SOCIAL HISTORY]
2. genhx [HISTORY of PRESENT ILLNESS]
3. pastmedicalhx [PAST MEDICAL HISTORY]
4. cc [CHIEF COMPLAINT]
5. pastsurgical [PAST SURGICAL HISTORY]
6. allergy
7. ros [REVIEW OF SYSTEMS]
8. medications
9. assessment
10. exam
11. diagnosis
12. disposition
13. plan
14. edcourse [EMERGENCY DEPARTMENT COURSE]
15. immunizations
16. imaging
17. gynhx [GYNECOLOGIC HISTORY]
18. procedures
19. other_history
20. labs
# Task B
The training set consists of 67 pairs of conversations and full notes. The validation set includes 20 pairs of conversations and clinical notes.
Full encounter notes are expected to have at least one of four overall section divisions demarked by the first-occuring of its related section headers:
> | note_division | section_headers
> | subjective | chief complaint, history of present illness, hpi, subjective
> | objective_exam | physical exam, exam
> | objective_results | results, findings
> | assessment_and_plan | assessment, plan
Depending on the encounter, objective_exam and objective_results may not be relevant.
We encourage review the sample data as well as the evaluation script to understand the best demarkation headers for your generated note.
# Task C
The training set consists of 67 pairs of full doctor-patient conversations and notes and the validation set includes 20 pairs of full conversations and clinical notes (same as Task-B datasets). The Task-A training and validation sets (1,301 pairs) could be used as additional training data.
提供机构:
Elfsong
原始信息汇总
数据集概述
数据集名称
- 名称: MediQA
- 别名: MEDIQA-Chat 2023 Training/Validation Data
数据集内容
- 任务类别:
- 总结
- 对话
- 语言: 英语
- 数据规模: 1K<n<10K
数据集详细内容
-
任务A
- 训练集: 包含1,201对对话及其相关的章节标题和内容。
- 验证集: 包含100对对话及其摘要。
- 章节标题列表: 共20个标准化章节标题,如“fam/sochx”代表“FAMILY HISTORY/SOCIAL HISTORY”。
-
任务B
- 训练集: 包含67对对话和完整笔记。
- 验证集: 包含20对对话和临床笔记。
- 笔记结构: 至少包含四个主要章节之一,如“subjective”、“objective_exam”、“objective_results”、“assessment_and_plan”。
-
任务C
- 训练集: 包含67对完整的医生-患者对话和笔记。
- 验证集: 包含20对完整的对话和临床笔记(与任务B相同)。
- 额外训练数据: 可使用任务A的训练和验证集(共1,301对)作为额外训练数据。



