'Where does it hurt?'-Dataset

Name: 'Where does it hurt?'-Dataset
Creator: 柏林应用科学大学数据科学和基于文本的信息系统组
Published: 2025-08-26 22:38:17
License: 暂无描述

arXiv2025-08-26 更新2025-08-28 收录

下载链接：

https://huggingface.co/DATEXIS

下载链接

链接失效反馈

官方服务：

资源简介：

“Where does it hurt?”数据集是由柏林应用科学大学数据科学和基于文本的信息系统组的研究团队开发的。该数据集是基于Aci-bench对话总结数据集构建的，包含207个对话和临床笔记对。研究团队与医疗专业人员合作，根据SOAP框架（主观、客观、评估和计划）开发了一个细粒度的医生意图分类体系，并招募了大量医疗专家对超过5000个医生-患者对话轮次进行了标注。该数据集用于基准测试最先进的生成和编码器模型在医疗意图分类任务上的性能，并首次报告了医疗对话结构中常见的轨迹，为设计“差异诊断”系统提供了有价值的见解。

The "Where does it hurt?" dataset was developed by the research team from the Data Science and Text-based Information Systems Group at Berlin University of Applied Sciences. It is constructed based on the Aci-bench dialogue summarization dataset and includes 207 pairs of dialogues and clinical notes. The research team collaborated with medical professionals to develop a fine-grained physician intent classification taxonomy based on the SOAP framework (Subjective, Objective, Assessment, and Plan), and recruited a large number of medical experts to annotate over 5000 physician-patient dialogue turns. This dataset is used to benchmark the performance of state-of-the-art generative and encoder models on the medical intent classification task, and for the first time reports the common trajectories in medical dialogue structures, providing valuable insights for the design of differential diagnosis systems.

提供机构：

柏林应用科学大学数据科学和基于文本的信息系统组

创建时间：

2025-08-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集