five

disi-unibo-nlp/medqa-5-opt-MedGENIE

收藏
Hugging Face2024-05-17 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/disi-unibo-nlp/medqa-5-opt-MedGENIE
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: id dtype: int64 - name: question dtype: string - name: target dtype: string - name: answers sequence: string - name: ctxs list: - name: text dtype: string splits: - name: train num_bytes: 77044736 num_examples: 10178 - name: validation num_bytes: 9662825 num_examples: 1272 - name: test num_bytes: 9719509 num_examples: 1273 download_size: 5761417 dataset_size: 96427070 license: mit task_categories: - question-answering language: - en tags: - medical --- # Dataset Card for "medqa-5-opt-MedGENIE" ## Dataset Description The data is a part of the MedGENIE collection of medical datasets augmented with artificial contexts generated by [PMC-LLaMA-13B](https://huggingface.co/axiong/PMC_LLaMA_13B). Specifically, up to 5 artificial contexts were generated for each question in [MedQA-USMLE](https://github.com/jind11/MedQA) (5 options), employing a multi-view approach to encompass various perspectives associated with the given question. The dataset has been used to train [MedGENIE-fid-flan-t5-base-medqa](https://huggingface.co/disi-unibo-nlp/MedGENIE-fid-flan-t5-base-medqa) allowing it to reach a new state-of-the-art on the MedQA-USMLE test set. For more information, refer to our paper ["**To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering**"](https://arxiv.org/abs/2403.01924) ## Dataset Structure The dataset has three splits, suitable for: * Training *question-answering* models, including *fusion-in-decoder* architectures. * Augmenting your LLMs during inference with generated contexts rather than retrived chunks. * Augmening your knolwedge base of factual documents with generated contexts for standard RAG pipeline. The number of examples per split is: - **train:** 10178 samples - **validation:** 1273 samples - **test:** 1273 samples The dataset is stored in parquet format with each entry using the following schema: ``` { "id": 0, "question": "A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7\u00b0F (36.5\u00b0C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?\nA. Ampicillin\nB. Ceftriaxone\nC. Ciprofloxacin\nD. Doxycycline\nE. Nitrofurantoin", "target": "E", "answers": [ "E" ], "ctxs": [ { "text": "This is a case of uncomplicated cystitis, which is frequently seen in pregnancy. Symptoms include urinary frequency,..." }, { "text": "The burning upon urination in a pregnant female is often due to asymptomatic bacteriuria that results in a urinary tract..." }, { "text": "The patient's symptoms are consistent with a lower urinary tract infection. An accurate history and physical exam exclude the..." }, { "text": "Asymptomatic bacteriuria is a frequent finding in pregnancy. Treatment is not recommended unless there are signs of an upper urinary..." }, { "text": "Asymptomatic bacteriuria is present if a patient has persistent (>2 weeks) bacteria in the urine as documented by a positive urine..." } ] } ``` ## Augmenting LLMs during inference Augmenting *state-of-the-art* LLMs with generated contexts from **medqa-5-opt-MedGENIE** demonstrated a remarkable performance boost. For a given question, all relevant contexts are concatenated and passed within the context window of the LLM. | Model | Learning|medqa-5-opt-MedGENIE |Accuracy | |------|------|-----|-----| | LLaMA-2-chat (7B)|2-shot | NO|29.2 | | LLaMA-2-chat (7B)| 2-shot|YES |47.1 **(+ 17.9)** | | Zephyr-β (7B)|2-shot|NO | 43.1 | | Zephyr-β (7B)|2-shot| YES |54.9 **(+ 11.8)** | ## Evaluation for RAG To assess the effectiveness of using our generated contexts for RAG pipeline, we augment the [MedWiki](https://huggingface.co/datasets/VOD-LM/medwiki) dataset with a smaller portion of artificially generated chunks derived from train and test sets of **medqa-5-opt-MedGENIE** and [medmcqa-MedGENIE](https://huggingface.co/datasets/disi-unibo-nlp/medmcqa-MedGENIE). | MedWiki chunks | Artificial chunks | Rerank | LLaMA-2-chat (7B) | mistral-instruct (7B) | Zephyr-β (7B) | |------|-----|----------------|-------------------|-----------------------|---------------------| | 4.5M | - | NO | 32.2 | 36.8 | 44.7 | | 4.5M | 96K (only test) | NO | 35.8 **(+ 3.5)** | 37.9 **(+ 1.1)** | 47.5 **(+2.8)** | | 4.5M | 2M (train + test)| NO | 36.3 **(+ 4.1)** | 37.9 **(+ 1.1)** | 47.8 **(+3.1)** | | 4.5M | - | YES | 32.8 | 35.1 | 44.0 | | 4.5M | 96K (only test)| YES | 36.5 **(+3.7)** | 37.6 **(+2.5)** | 47.8 **(+2.8)** | | 4.5M | 2M (train + test)| YES | 33.5 **(+0.8)** | 37.2 **(+2.1)** | 47.9 **(+3.9)** | ## Citation If you find this dataset is useful in your work, please cite it with: ``` @misc{frisoni2024generate, title={To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering}, author={Giacomo Frisoni and Alessio Cocchieri and Alex Presepi and Gianluca Moro and Zaiqiao Meng}, year={2024}, eprint={2403.01924}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```
提供机构:
disi-unibo-nlp
原始信息汇总

数据集卡片 "medqa-5-opt-MedGENIE"

数据集描述

该数据集是MedGENIE医学数据集集合的一部分,通过PMC-LLaMA-13B生成的合成上下文进行增强。具体来说,为MedQA-USMLE中的每个问题(5个选项)生成了最多5个合成上下文,采用多视角方法涵盖与给定问题相关的各种视角。

该数据集已被用于训练MedGENIE-fid-flan-t5-base-medqa,使其在MedQA-USMLE测试集上达到新的最先进水平。

更多信息请参考我们的论文"To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering"

数据集结构

数据集包含三个部分,适用于:

  • 训练问答模型,包括融合-解码器架构。
  • 在推理过程中用生成的上下文增强大型语言模型(LLMs),而不是检索的片段。
  • 用生成的上下文增强事实文档的知识库,用于标准RAG管道。

每个部分的样本数量为:

  • 训练集: 10178个样本
  • 验证集: 1273个样本
  • 测试集: 1273个样本

数据集以parquet格式存储,每个条目使用以下模式: json { "id": 0, "question": "A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7u00b0F (36.5u00b0C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient? A. Ampicillin B. Ceftriaxone C. Ciprofloxacin D. Doxycycline E. Nitrofurantoin", "target": "E", "answers": [ "E" ], "ctxs": [ { "text": "This is a case of uncomplicated cystitis, which is frequently seen in pregnancy. Symptoms include urinary frequency,..." }, { "text": "The burning upon urination in a pregnant female is often due to asymptomatic bacteriuria that results in a urinary tract..." }, { "text": "The patients symptoms are consistent with a lower urinary tract infection. An accurate history and physical exam exclude the..." }, { "text": "Asymptomatic bacteriuria is a frequent finding in pregnancy. Treatment is not recommended unless there are signs of an upper urinary..." }, { "text": "Asymptomatic bacteriuria is present if a patient has persistent (>2 weeks) bacteria in the urine as documented by a positive urine..." } ] }

增强LLMs在推理过程中的表现

使用medqa-5-opt-MedGENIE生成的上下文增强最先进的LLMs,显示出显著的性能提升。对于给定的问题,所有相关上下文被连接并传递到LLM的上下文窗口中。

模型 学习方式 使用medqa-5-opt-MedGENIE 准确率
LLaMA-2-chat (7B) 2-shot 29.2
LLaMA-2-chat (7B) 2-shot 47.1 (+ 17.9)
Zephyr-β (7B) 2-shot 43.1
Zephyr-β (7B) 2-shot 54.9 (+ 11.8)

评估RAG

为了评估使用我们生成的上下文进行RAG管道的有效性,我们用medqa-5-opt-MedGENIEmedmcqa-MedGENIE的训练和测试集中的较小部分人工生成的片段增强MedWiki数据集。

MedWiki片段 人工片段 重新排序 LLaMA-2-chat (7B) mistral-instruct (7B) Zephyr-β (7B)
4.5M - 32.2 36.8 44.7
4.5M 96K (仅测试) 35.8 (+ 3.5) 37.9 (+ 1.1) 47.5 (+2.8)
4.5M 2M (训练+测试) 36.3 (+ 4.1) 37.9 (+ 1.1) 47.8 (+3.1)
4.5M - 32.8 35.1 44.0
4.5M 96K (仅测试) 36.5 (+3.7) 37.6 (+2.5) 47.8 (+2.8)
4.5M 2M (训练+测试) 33.5 (+0.8) 37.2 (+2.1) 47.9 (+3.9)

引用

如果您发现此数据集在您的工作中有用,请引用:

@misc{frisoni2024generate, title={To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering}, author={Giacomo Frisoni and Alessio Cocchieri and Alex Presepi and Gianluca Moro and Zaiqiao Meng}, year={2024}, eprint={2403.01924}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作