katielink/liveqa_trec2017
收藏Hugging Face2023-08-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/katielink/liveqa_trec2017
下载链接
链接失效反馈资源简介:
---
task_categories:
- question-answering
language:
- en
tags:
- medical
pretty_name: LiveQAMedical
size_categories:
- n<1K
---
# Dataset Card for LiveQA Medical from TREC 2017
The LiveQA'17 medical task focuses on consumer health question answering. Consumer health questions were received by the U.S. National Library of Medicine (NLM).
The dataset consists of constructed medical question-answer pairs for training and testing, with additional annotations that can be used to develop question analysis and question answering systems.
Please refer to our overview paper for more information about the constructed datasets and the LiveQA Track:
Asma Ben Abacha, Eugene Agichtein, Yuval Pinter & Dina Demner-Fushman. Overview of the Medical Question Answering Task at TREC 2017 LiveQA. TREC, Gaithersburg, MD, 2017 (https://trec.nist.gov/pubs/trec26/papers/Overview-QA.pdf).
**Homepage:** [https://github.com/abachaa/LiveQA_MedicalTask_TREC2017](https://github.com/abachaa/LiveQA_MedicalTask_TREC2017)
## Medical Training Data
The dataset provides 634 question-answer pairs for training:
1) TREC-2017-LiveQA-Medical-Train-1.xml => 388 question-answer pairs corresponding to 200 NLM questions.
Each question is divided into one or more subquestion(s). Each subquestion has one or more answer(s).
These question-answer pairs were constructed automatically and validated manually.
2) TREC-2017-LiveQA-Medical-Train-2.xml => 246 question-answer pairs corresponding to 246 NLM questions.
Answers were retrieved manually by librarians.
**You can access them as jsonl**
The datasets are not exhaustive with regards to subquestions, i.e., some subquestions might not be annotated.
Additional annotations are provided for both (i) the Focus and (ii) the Question Type used to define each subquestion.
23 question types were considered (e.g. Treatment, Cause, Diagnosis, Indication, Susceptibility, Dosage) related to four focus categories: Disease, Drug, Treatment and Exam.
## Medical Test Data
Test split can be easily downloaded via huggingface.
Test questions cover 26 question types associated with five focus categories.
Each question includes one or more subquestion(s) and at least one focus and one question type.
Reference answers were selected from trusted resources and validated by medical experts.
At least one reference answer is provided for each test question, its URL and relevant comments.
Question paraphrases were created by assessors and used with the reference answers to judge the participants' answers.
```
If you use these datasets, please cite paper:
@inproceedings{LiveMedQA2017,
author = {Asma {Ben Abacha} and Eugene Agichtein and Yuval Pinter and Dina Demner{-}Fushman},
title = {Overview of the Medical Question Answering Task at TREC 2017 LiveQA},
booktitle = {TREC 2017},
year = {2017}
}
```
提供机构:
katielink
原始信息汇总
数据集概述
基本信息
- 任务类别:问答(question-answering)
- 语言:英语(en)
- 标签:医疗(medical)
- 数据集名称:LiveQAMedical
- 数据集大小:小于1000条记录(n<1K)
数据集描述
- 目的:专注于消费者健康问题的问答。
- 来源:美国国家医学图书馆(NLM)接收的消费者健康问题。
- 内容:包含构造的医学问答对,用于训练和测试,以及用于开发问答系统的额外标注。
训练数据
- 数量:634对问答
- TREC-2017-LiveQA-Medical-Train-1.xml:388对问答,对应200个NLM问题。
- TREC-2017-LiveQA-Medical-Train-2.xml:246对问答,对应246个NLM问题。
- 特点:
- 部分问答对自动构建并手动验证。
- 部分答案由图书馆员手动检索。
- 额外标注:
- 问题焦点(Focus)
- 问题类型(Question Type),包括23种类型,如治疗、原因、诊断等。
测试数据
- 特点:
- 包含26种问题类型,与五个焦点类别相关。
- 每个问题包含一个或多个子问题,至少有一个焦点和一个问题类型。
- 参考答案来自可信资源,并由医学专家验证。
- 每个测试问题至少提供一个参考答案及其URL和相关评论。
- 问题重述由评估者创建,并与参考答案一起用于评估参与者的答案。



