disi-unibo-nlp/mmlu-medical-MedGENIE
收藏数据集描述
该数据集是MedGENIE医学数据集集合的一部分,通过PMC-LLaMA-13B生成的合成上下文进行了增强。具体来说,针对MMLU中的9个医学主题的每个问题,最多生成了5个合成上下文,采用多视角方法涵盖与给定问题相关的各种视角。
更多信息请参考我们的论文"To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering"。
数据集结构
该数据集适用于:
- 在推理过程中使用生成的上下文增强LLMs,而不是检索的片段。
- 使用生成的上下文增强事实文档的知识库,用于标准RAG管道。
数据集中的样本数量为:
- test: 1862个样本
数据集以parquet格式存储,每个条目使用以下模式: json { "id": 0, "question": "Which of the changes below following the start codon in an mRNA would most likely have the greatest deleterious effect? A. a deletion of a single nucleotide B. a deletion of a nucleotide triplet C. a single nucleotide substitution of the nucleotide occupying the first codon position D. a single nucleotide substitution of the nucleotide occupying the third codon position", "target": "A", "answers": [ "A" ], "ctxs": [ { "text": "A single nucleotide change in the genetic code could result in a completely different amino acid being inserted into a protein, which could greatly affect its function. A deletion of one or more nucleotides would most likely lead to the premature termination of translation and production of an incomplete nonfunctional peptide chain. Substitutions at the third base position..." }, { "text": "The question is asking about the effect of mutations in the genetic code (nucleic acid sequence). The first two options are synonymous mutations since they involve a codon change with no amino acid substitution. A missense mutation, as in option 3, would result in an amino acid substitution. As a rule, these types of point substitutions generally have less severe effects..." }, { "text": "During translation, the codon is read in a "three-base" code. That means that three bases (nucleotides) of the mRNA sequence are read at one time by an amino acid. A change in any of these three positions can potentially lead to a different amino acid being added to the growing protein chain. A change at position one (first base or nucleotide triplet) may not necessarily..." }, { "text": "Naturally occurring mutations in the start codon are rare, and most of these altered proteins are defective. The overall effect on protein function when a single amino acid is substituted for another is highly variable, depending upon which specific amino acid replaces which other amino acid in a protein.u00a0Some changes have no significant effect; others substantially..." }, { "text": "Nonsense mutations cause premature termination of translation. A nonsense codon in mRNA causes the formation of a stop signal in the protein, resulting in its incomplete synthesis. Most often, such a defective protein is rapidly degraded by proteolytic enzymes and has no functional role within cells. Obviously a truncated version of any protein would be useless for its..." } ], "subject": "high_school_biology" }
增强LLMs在推理过程中的表现
通过medqa-MedGENIE生成的上下文增强state-of-the-art LLMs,展示了显著的性能提升。对于给定的问题,所有相关的上下文被连接并传递到LLM的上下文窗口中。
| 模型 | 学习方式 | medqa-5-opt-MedGENIE | 准确率 |
|---|---|---|---|
| LLaMA-2-chat (7B) | 2-shot | NO | 49.3 |
| LLaMA-2-chat (7B) | 2-shot | YES | 56.5 (+ 7.2) |
| Zephyr-β (7B) | 2-shot | NO | 60.7 |
| Zephyr-β (7B) | 2-shot | YES | 65.1 (+ 4.4) |
引用
如果您在工作中发现此数据集有用,请引用:
@misc{frisoni2024generate, title={To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering}, author={Giacomo Frisoni and Alessio Cocchieri and Alex Presepi and Gianluca Moro and Zaiqiao Meng}, year={2024}, eprint={2403.01924}, archivePrefix={arXiv}, primaryClass={cs.CL} }




