five

lavita/MedREQAL

收藏
Hugging Face2024-08-15 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/lavita/MedREQAL
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: question dtype: string - name: background dtype: string - name: objective dtype: string - name: conclusion dtype: string - name: verdicts dtype: string - name: strength dtype: string - name: label dtype: int64 - name: category dtype: string splits: - name: train num_bytes: 4679909 num_examples: 2786 download_size: 2365567 dataset_size: 4679909 language: - en tags: - medical size_categories: - 1K<n<10K task_categories: - question-answering - text-classification --- # Dataset Card for "MedREQAL" This dataset is the converted version of [MedREQAL](https://github.com/jvladika/MedREQAL). ## Reference If you use MedREQAL, please cite the original paper: ``` @inproceedings{vladika-etal-2024-medreqal, title = "{M}ed{REQAL}: Examining Medical Knowledge Recall of Large Language Models via Question Answering", author = "Vladika, Juraj and Schneider, Phillip and Matthes, Florian", editor = "Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek", booktitle = "Findings of the Association for Computational Linguistics ACL 2024", month = aug, year = "2024", address = "Bangkok, Thailand and virtual meeting", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.findings-acl.860", pages = "14459--14469", abstract = "In recent years, Large Language Models (LLMs) have demonstrated an impressive ability to encode knowledge during pre-training on large text corpora. They can leverage this knowledge for downstream tasks like question answering (QA), even in complex areas involving health topics. Considering their high potential for facilitating clinical work in the future, understanding the quality of encoded medical knowledge and its recall in LLMs is an important step forward. In this study, we examine the capability of LLMs to exhibit medical knowledge recall by constructing a novel dataset derived from systematic reviews {--} studies synthesizing evidence-based answers for specific medical questions. Through experiments on the new MedREQAL dataset, comprising question-answer pairs extracted from rigorous systematic reviews, we assess six LLMs, such as GPT and Mixtral, analyzing their classification and generation performance. Our experimental insights into LLM performance on the novel biomedical QA dataset reveal the still challenging nature of this task.", } ```
提供机构:
lavita
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作