five

lavita/MedQuAD

收藏
Hugging Face2023-12-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lavita/MedQuAD
下载链接
链接失效反馈
官方服务:
资源简介:
MedQuAD数据集是从MedQuAD转换而来的,主要用于医学领域的问答任务。数据集中包含文档ID、文档来源、文档URL、类别、UMLS CUI、UMLS语义类型、UMLS语义组、同义词、问题ID、问题焦点、问题类型、问题和答案等特征。数据集分为训练集,包含47,441个样本。为了尊重MedlinePlus版权,移除了部分来源的答案。README还列出了数据集与论文中问题类型之间的差异,并提供了引用信息。

The MedQuAD dataset is converted from the original MedQuAD resources, and is primarily used for medical domain question answering tasks. It includes features such as document ID, document source, document URL, category, UMLS CUI, UMLS semantic type, UMLS semantic group, synonyms, question ID, question focus, question type, questions and answers. The dataset is split into a training set containing 47,441 samples. To respect the copyright of MedlinePlus, answers from some sources have been removed. The README also lists the discrepancies between the dataset and the question types in the corresponding paper, and provides citation information.
提供机构:
lavita
原始信息汇总

数据集概述

数据集信息

特征

  • document_id: 字符串类型
  • document_source: 字符串类型
  • document_url: 字符串类型
  • category: 字符串类型
  • umls_cui: 字符串类型
  • umls_semantic_types: 字符串类型
  • umls_semantic_group: 字符串类型
  • synonyms: 字符串类型
  • question_id: 字符串类型
  • question_focus: 字符串类型
  • question_type: 字符串类型
  • question: 字符串类型
  • answer: 字符串类型

数据分割

  • train: 包含47441个样本,占用34989308字节

下载和数据大小

  • download_size: 10718159字节
  • dataset_size: 34989308字节

任务类别

  • 问答系统

语言

  • 英语

标签

  • 医学

数据集大小类别

  • 10K<n<100K

数据集特点

  • umls_cui, umls_semantic_types, synonyms 列中的多个值以 | 字符分隔。
  • 来自 [GARD, MPlusHerbsSupplements, ADAM, MPlusDrugs] 来源的答案(31,034条记录)已从原始数据集中删除,以尊重 MedlinePlus 版权。
  • UMLS: 统一医学语言系统
  • CUI: 概念唯一标识符

问题类型差异

  • 数据集中的问题类型与论文中提到的问题类型存在一些差异,具体差异如下:
数据集问题类型 论文问题类型
how can i learn more learn more
brand names of combination products brand names
other information information
outlook prognosis
exams and tests diagnosis (exams and tests)
stages ?
precautions ?
interactions with herbs and supplements interaction with herbs and supplements
when to contact a medical professional contact a medical professional
research research (or clinical trial)
interactions with medications interaction with medications
interactions with foods interaction with food
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
MedQuAD is a medical question-answering dataset with 47,441 entries, featuring questions and answers annotated with UMLS identifiers and semantic types. It is formatted in Parquet and intended for developing AI models in healthcare, with specific notes on copyright and question type variations.
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作