disi-unibo-nlp/medqa-5-opt-MedGENIE
收藏数据集卡片 "medqa-5-opt-MedGENIE"
数据集描述
该数据集是MedGENIE医学数据集集合的一部分,通过PMC-LLaMA-13B生成的合成上下文进行增强。具体来说,为MedQA-USMLE中的每个问题(5个选项)生成了最多5个合成上下文,采用多视角方法涵盖与给定问题相关的各种视角。
该数据集已被用于训练MedGENIE-fid-flan-t5-base-medqa,使其在MedQA-USMLE测试集上达到新的最先进水平。
更多信息请参考我们的论文"To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering"。
数据集结构
数据集包含三个部分,适用于:
- 训练问答模型,包括融合-解码器架构。
- 在推理过程中用生成的上下文增强大型语言模型(LLMs),而不是检索的片段。
- 用生成的上下文增强事实文档的知识库,用于标准RAG管道。
每个部分的样本数量为:
- 训练集: 10178个样本
- 验证集: 1273个样本
- 测试集: 1273个样本
数据集以parquet格式存储,每个条目使用以下模式: json { "id": 0, "question": "A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7u00b0F (36.5u00b0C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient? A. Ampicillin B. Ceftriaxone C. Ciprofloxacin D. Doxycycline E. Nitrofurantoin", "target": "E", "answers": [ "E" ], "ctxs": [ { "text": "This is a case of uncomplicated cystitis, which is frequently seen in pregnancy. Symptoms include urinary frequency,..." }, { "text": "The burning upon urination in a pregnant female is often due to asymptomatic bacteriuria that results in a urinary tract..." }, { "text": "The patients symptoms are consistent with a lower urinary tract infection. An accurate history and physical exam exclude the..." }, { "text": "Asymptomatic bacteriuria is a frequent finding in pregnancy. Treatment is not recommended unless there are signs of an upper urinary..." }, { "text": "Asymptomatic bacteriuria is present if a patient has persistent (>2 weeks) bacteria in the urine as documented by a positive urine..." } ] }
增强LLMs在推理过程中的表现
使用medqa-5-opt-MedGENIE生成的上下文增强最先进的LLMs,显示出显著的性能提升。对于给定的问题,所有相关上下文被连接并传递到LLM的上下文窗口中。
| 模型 | 学习方式 | 使用medqa-5-opt-MedGENIE | 准确率 |
|---|---|---|---|
| LLaMA-2-chat (7B) | 2-shot | 否 | 29.2 |
| LLaMA-2-chat (7B) | 2-shot | 是 | 47.1 (+ 17.9) |
| Zephyr-β (7B) | 2-shot | 否 | 43.1 |
| Zephyr-β (7B) | 2-shot | 是 | 54.9 (+ 11.8) |
评估RAG
为了评估使用我们生成的上下文进行RAG管道的有效性,我们用medqa-5-opt-MedGENIE和medmcqa-MedGENIE的训练和测试集中的较小部分人工生成的片段增强MedWiki数据集。
| MedWiki片段 | 人工片段 | 重新排序 | LLaMA-2-chat (7B) | mistral-instruct (7B) | Zephyr-β (7B) |
|---|---|---|---|---|---|
| 4.5M | - | 否 | 32.2 | 36.8 | 44.7 |
| 4.5M | 96K (仅测试) | 否 | 35.8 (+ 3.5) | 37.9 (+ 1.1) | 47.5 (+2.8) |
| 4.5M | 2M (训练+测试) | 否 | 36.3 (+ 4.1) | 37.9 (+ 1.1) | 47.8 (+3.1) |
| 4.5M | - | 是 | 32.8 | 35.1 | 44.0 |
| 4.5M | 96K (仅测试) | 是 | 36.5 (+3.7) | 37.6 (+2.5) | 47.8 (+2.8) |
| 4.5M | 2M (训练+测试) | 是 | 33.5 (+0.8) | 37.2 (+2.1) | 47.9 (+3.9) |
引用
如果您发现此数据集在您的工作中有用,请引用:
@misc{frisoni2024generate, title={To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering}, author={Giacomo Frisoni and Alessio Cocchieri and Alex Presepi and Gianluca Moro and Zaiqiao Meng}, year={2024}, eprint={2403.01924}, archivePrefix={arXiv}, primaryClass={cs.CL} }



