MedQuAD (Medical Question Answering Dataset)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/MedQuAD
下载链接
链接失效反馈官方服务:
资源简介:
MedQuAD 包括从 12 个 NIH 网站(例如,cancer.gov、niddk.nih.gov、GARD、MedlinePlus Health Topics)创建的 47,457 个医学问答对。该集合涵盖与疾病、药物和其他医学实体(例如测试)相关的 37 个问题类型(例如治疗、诊断、副作用)。
我们在 XML 文件中添加了额外的注释,可用于各种 IR 和 NLP 任务,例如问题类型、问题焦点、其同义词、其 UMLS 概念唯一标识符 (CUI) 和语义类型。
我们在 4 个 MedlinePlus 集合中添加了问题焦点的类别(疾病、药物或其他)。所有其他收藏都是关于疾病的。
MedQuAD comprises 47,457 medical question-answer pairs curated from 12 NIH-affiliated websites, such as cancer.gov, niddk.nih.gov, GARD, and MedlinePlus Health Topics. This dataset encompasses 37 question types (e.g., "treatment", "diagnosis", "side effects") associated with diseases, medications, and other medical entities like diagnostic tests.
We have added supplementary annotations in XML format, which can be employed for diverse Information Retrieval (IR) and Natural Language Processing (NLP) tasks, including question type, question focus, corresponding synonyms, their Unified Medical Language System (UMLS) Concept Unique Identifier (CUI), and semantic types.
We have added category labels for question focus (i.e., "disease", "medication", or "others") to the four MedlinePlus subsets. All remaining subsets exclusively focus on diseases.
提供机构:
OpenDataLab
创建时间:
2022-08-16
搜集汇总
数据集介绍

背景与挑战
背景概述
MedQuAD是一个医学问答数据集,包含从12个NIH网站收集的47,457个问答对,涵盖疾病、药物等医学实体的37个问题类型。该数据集提供了XML注释,支持信息检索和自然语言处理任务。
以上内容由遇见数据集搜集并总结生成



