five

hyesunyun/liveqa_medical_trec2017

收藏
Hugging Face2023-06-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/hyesunyun/liveqa_medical_trec2017
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - question-answering language: - en tags: - medical pretty_name: LiveQAMedical size_categories: - n<1K --- # Dataset Card for LiveQA Medical from TREC 2017 The LiveQA'17 medical task focuses on consumer health question answering. Consumer health questions were received by the U.S. National Library of Medicine (NLM). The dataset consists of constructed medical question-answer pairs for training and testing, with additional annotations that can be used to develop question analysis and question answering systems. Please refer to our overview paper for more information about the constructed datasets and the LiveQA Track: Asma Ben Abacha, Eugene Agichtein, Yuval Pinter & Dina Demner-Fushman. Overview of the Medical Question Answering Task at TREC 2017 LiveQA. TREC, Gaithersburg, MD, 2017 (https://trec.nist.gov/pubs/trec26/papers/Overview-QA.pdf). **Homepage:** [https://github.com/abachaa/LiveQA_MedicalTask_TREC2017](https://github.com/abachaa/LiveQA_MedicalTask_TREC2017) ## Medical Training Data The dataset provides 634 question-answer pairs for training: 1) TREC-2017-LiveQA-Medical-Train-1.xml => 388 question-answer pairs corresponding to 200 NLM questions. Each question is divided into one or more subquestion(s). Each subquestion has one or more answer(s). These question-answer pairs were constructed automatically and validated manually. 2) TREC-2017-LiveQA-Medical-Train-2.xml => 246 question-answer pairs corresponding to 246 NLM questions. Answers were retrieved manually by librarians. **You can access them as jsonl** The datasets are not exhaustive with regards to subquestions, i.e., some subquestions might not be annotated. Additional annotations are provided for both (i) the Focus and (ii) the Question Type used to define each subquestion. 23 question types were considered (e.g. Treatment, Cause, Diagnosis, Indication, Susceptibility, Dosage) related to four focus categories: Disease, Drug, Treatment and Exam. ## Medical Test Data Test split can be easily downloaded via huggingface. Test questions cover 26 question types associated with five focus categories. Each question includes one or more subquestion(s) and at least one focus and one question type. Reference answers were selected from trusted resources and validated by medical experts. At least one reference answer is provided for each test question, its URL and relevant comments. Question paraphrases were created by assessors and used with the reference answers to judge the participants' answers. ``` If you use these datasets, please cite paper: @inproceedings{LiveMedQA2017, author = {Asma {Ben Abacha} and Eugene Agichtein and Yuval Pinter and Dina Demner{-}Fushman}, title = {Overview of the Medical Question Answering Task at TREC 2017 LiveQA}, booktitle = {TREC 2017}, year = {2017} } ```
提供机构:
hyesunyun
原始信息汇总

数据集卡片:LiveQA Medical from TREC 2017

概述

LiveQA17医学任务专注于消费者健康问答。该数据集由美国国家医学图书馆(NLM)接收的消费者健康问题构成。数据集包括用于训练和测试的医学问答对,以及可用于开发问题分析和问答系统的额外注释。

数据集详情

医学训练数据

  • 数据集包含634个问答对用于训练:
    • TREC-2017-LiveQA-Medical-Train-1.xml:包含388个问答对,对应200个NLM问题。每个问题被分为一个或多个子问题,每个子问题有一个或多个答案。这些问答对是自动构建并手动验证的。
    • TREC-2017-LiveQA-Medical-Train-2.xml:包含246个问答对,对应246个NLM问题。答案由图书馆员手动检索。
  • 数据集不全面覆盖子问题,即某些子问题可能未被注释。
  • 提供了额外的注释,包括(i)焦点和(ii)问题类型,用于定义每个子问题。考虑了23种问题类型(例如治疗、原因、诊断、指示、易感性、剂量),与四个焦点类别相关:疾病、药物、治疗和检查。

医学测试数据

  • 测试问题涵盖26种问题类型,与五个焦点类别相关。
  • 每个问题包括一个或多个子问题,并且至少有一个焦点和一个问题类型。
  • 参考答案从可信资源中选择,并由医学专家验证。每个测试问题至少提供一个参考答案、其URL和相关评论。
  • 问题释义由评估人员创建,并与参考答案一起用于评判参与者的答案。

引用

如果使用这些数据集,请引用以下论文:

@inproceedings{LiveMedQA2017, author = {Asma {Ben Abacha} and Eugene Agichtein and Yuval Pinter and Dina Demner{-}Fushman}, title = {Overview of the Medical Question Answering Task at TREC 2017 LiveQA}, booktitle = {TREC 2017}, year = {2017} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作