disi-unibo-nlp/medqa-5-opt-MedGENIE

Name: disi-unibo-nlp/medqa-5-opt-MedGENIE
Creator: disi-unibo-nlp
Published: 2024-05-17 07:40:38
License: 暂无描述

Hugging Face2024-05-17 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: int64 - name: question dtype: string - name: target dtype: string - name: answers sequence: string - name: ctxs list: - name: text dtype: string splits: - name: train num_bytes: 77044736 num_examples: 10178 - name: validation num_bytes: 9662825 num_examples: 1272 - name: test num_bytes: 9719509 num_examples: 1273 download_size: 5761417 dataset_size: 96427070 license: mit task_categories: - question-answering language: - en tags: - medical --- # Dataset Card for "medqa-5-opt-MedGENIE" ## Dataset Description The data is a part of the MedGENIE collection of medical datasets augmented with artificial contexts generated by [PMC-LLaMA-13B](https://huggingface.co/axiong/PMC_LLaMA_13B). Specifically, up to 5 artificial contexts were generated for each question in [MedQA-USMLE](https://github.com/jind11/MedQA) (5 options), employing a multi-view approach to encompass various perspectives associated with the given question. The dataset has been used to train [MedGENIE-fid-flan-t5-base-medqa](https://huggingface.co/disi-unibo-nlp/MedGENIE-fid-flan-t5-base-medqa) allowing it to reach a new state-of-the-art on the MedQA-USMLE test set. For more information, refer to our paper ["**To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering**"](https://arxiv.org/abs/2403.01924) ## Dataset Structure The dataset has three splits, suitable for: * Training *question-answering* models, including *fusion-in-decoder* architectures. * Augmenting your LLMs during inference with generated contexts rather than retrived chunks. * Augmening your knolwedge base of factual documents with generated contexts for standard RAG pipeline. The number of examples per split is: - **train:** 10178 samples - **validation:** 1273 samples - **test:** 1273 samples The dataset is stored in parquet format with each entry using the following schema: ``` { "id": 0, "question": "A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7\u00b0F (36.5\u00b0C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?\nA. Ampicillin\nB. Ceftriaxone\nC. Ciprofloxacin\nD. Doxycycline\nE. Nitrofurantoin", "target": "E", "answers": [ "E" ], "ctxs": [ { "text": "This is a case of uncomplicated cystitis, which is frequently seen in pregnancy. Symptoms include urinary frequency,..." }, { "text": "The burning upon urination in a pregnant female is often due to asymptomatic bacteriuria that results in a urinary tract..." }, { "text": "The patient's symptoms are consistent with a lower urinary tract infection. An accurate history and physical exam exclude the..." }, { "text": "Asymptomatic bacteriuria is a frequent finding in pregnancy. Treatment is not recommended unless there are signs of an upper urinary..." }, { "text": "Asymptomatic bacteriuria is present if a patient has persistent (>2 weeks) bacteria in the urine as documented by a positive urine..." } ] } ``` ## Augmenting LLMs during inference Augmenting *state-of-the-art* LLMs with generated contexts from **medqa-5-opt-MedGENIE** demonstrated a remarkable performance boost. For a given question, all relevant contexts are concatenated and passed within the context window of the LLM. | Model | Learning|medqa-5-opt-MedGENIE |Accuracy | |------|------|-----|-----| | LLaMA-2-chat (7B)|2-shot | NO|29.2 | | LLaMA-2-chat (7B)| 2-shot|YES |47.1 **(+ 17.9)** | | Zephyr-β (7B)|2-shot|NO | 43.1 | | Zephyr-β (7B)|2-shot| YES |54.9 **(+ 11.8)** | ## Evaluation for RAG To assess the effectiveness of using our generated contexts for RAG pipeline, we augment the [MedWiki](https://huggingface.co/datasets/VOD-LM/medwiki) dataset with a smaller portion of artificially generated chunks derived from train and test sets of **medqa-5-opt-MedGENIE** and [medmcqa-MedGENIE](https://huggingface.co/datasets/disi-unibo-nlp/medmcqa-MedGENIE). | MedWiki chunks | Artificial chunks | Rerank | LLaMA-2-chat (7B) | mistral-instruct (7B) | Zephyr-β (7B) | |------|-----|----------------|-------------------|-----------------------|---------------------| | 4.5M | - | NO | 32.2 | 36.8 | 44.7 | | 4.5M | 96K (only test) | NO | 35.8 **(+ 3.5)** | 37.9 **(+ 1.1)** | 47.5 **(+2.8)** | | 4.5M | 2M (train + test)| NO | 36.3 **(+ 4.1)** | 37.9 **(+ 1.1)** | 47.8 **(+3.1)** | | 4.5M | - | YES | 32.8 | 35.1 | 44.0 | | 4.5M | 96K (only test)| YES | 36.5 **(+3.7)** | 37.6 **(+2.5)** | 47.8 **(+2.8)** | | 4.5M | 2M (train + test)| YES | 33.5 **(+0.8)** | 37.2 **(+2.1)** | 47.9 **(+3.9)** | ## Citation If you find this dataset is useful in your work, please cite it with: ``` @misc{frisoni2024generate, title={To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering}, author={Giacomo Frisoni and Alessio Cocchieri and Alex Presepi and Gianluca Moro and Zaiqiao Meng}, year={2024}, eprint={2403.01924}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

提供机构：

disi-unibo-nlp

原始信息汇总

数据集卡片 "medqa-5-opt-MedGENIE"

数据集描述

该数据集是MedGENIE医学数据集集合的一部分，通过PMC-LLaMA-13B生成的合成上下文进行增强。具体来说，为MedQA-USMLE中的每个问题（5个选项）生成了最多5个合成上下文，采用多视角方法涵盖与给定问题相关的各种视角。

该数据集已被用于训练MedGENIE-fid-flan-t5-base-medqa，使其在MedQA-USMLE测试集上达到新的最先进水平。

数据集结构

数据集包含三个部分，适用于：

训练问答模型，包括融合-解码器架构。
在推理过程中用生成的上下文增强大型语言模型（LLMs），而不是检索的片段。
用生成的上下文增强事实文档的知识库，用于标准RAG管道。

每个部分的样本数量为：

训练集: 10178个样本
验证集: 1273个样本
测试集: 1273个样本

数据集以parquet格式存储，每个条目使用以下模式： json { "id": 0, "question": "A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7u00b0F (36.5u00b0C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient? A. Ampicillin B. Ceftriaxone C. Ciprofloxacin D. Doxycycline E. Nitrofurantoin", "target": "E", "answers": [ "E" ], "ctxs": [ { "text": "This is a case of uncomplicated cystitis, which is frequently seen in pregnancy. Symptoms include urinary frequency,..." }, { "text": "The burning upon urination in a pregnant female is often due to asymptomatic bacteriuria that results in a urinary tract..." }, { "text": "The patients symptoms are consistent with a lower urinary tract infection. An accurate history and physical exam exclude the..." }, { "text": "Asymptomatic bacteriuria is a frequent finding in pregnancy. Treatment is not recommended unless there are signs of an upper urinary..." }, { "text": "Asymptomatic bacteriuria is present if a patient has persistent (>2 weeks) bacteria in the urine as documented by a positive urine..." } ] }

增强LLMs在推理过程中的表现

使用medqa-5-opt-MedGENIE生成的上下文增强最先进的LLMs，显示出显著的性能提升。对于给定的问题，所有相关上下文被连接并传递到LLM的上下文窗口中。

模型	学习方式	使用medqa-5-opt-MedGENIE	准确率
LLaMA-2-chat (7B)	2-shot	否	29.2
LLaMA-2-chat (7B)	2-shot	是	47.1 (+ 17.9)
Zephyr-β (7B)	2-shot	否	43.1
Zephyr-β (7B)	2-shot	是	54.9 (+ 11.8)

评估RAG

为了评估使用我们生成的上下文进行RAG管道的有效性，我们用medqa-5-opt-MedGENIE和medmcqa-MedGENIE的训练和测试集中的较小部分人工生成的片段增强MedWiki数据集。

MedWiki片段	人工片段	重新排序	LLaMA-2-chat (7B)	mistral-instruct (7B)	Zephyr-β (7B)
4.5M	-	否	32.2	36.8	44.7
4.5M	96K (仅测试)	否	35.8 (+ 3.5)	37.9 (+ 1.1)	47.5 (+2.8)
4.5M	2M (训练+测试)	否	36.3 (+ 4.1)	37.9 (+ 1.1)	47.8 (+3.1)
4.5M	-	是	32.8	35.1	44.0
4.5M	96K (仅测试)	是	36.5 (+3.7)	37.6 (+2.5)	47.8 (+2.8)
4.5M	2M (训练+测试)	是	33.5 (+0.8)	37.2 (+2.1)	47.9 (+3.9)

引用

如果您发现此数据集在您的工作中有用，请引用：

@misc{frisoni2024generate, title={To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering}, author={Giacomo Frisoni and Alessio Cocchieri and Alex Presepi and Gianluca Moro and Zaiqiao Meng}, year={2024}, eprint={2403.01924}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集