recogna-nlp/drbodebench_medicamentos
收藏Hugging Face2026-03-16 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/recogna-nlp/drbodebench_medicamentos
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- question-answering
- zero-shot-classification
- text-generation
language:
- pt
tags:
- llm
- benchmark
- portuguese
- bode
pretty_name: DrBodeBench
size_categories:
- n<1K
---
# Medication-Focused Clinical Benchmark from DrBodeBench
## Dataset Details
To evaluate retrieval capabilities in higher-level reasoning scenarios, we created a second benchmark derived from the Portuguese medical benchmark [DrBodeBench](https://huggingface.co/datasets/recogna-nlp/drbodebench). This benchmark aggregates questions from Brazilian medical examinations, including the Revalida and the FUVEST direct-access residency exam. From DrBodeBench, we curated a specific subset of questions that exclusively pertains to medication-related topics.
A large language model was employed to identify examination items in which medication knowledge plays a central role in diagnostic or therapeutic decision-making. Only questions requiring explicit pharmacological integration were retained. The resulting dataset consists of clinically contextualized multiple-choice scenarios that require integration of medication knowledge with patient history, laboratory findings, and clinical reasoning.
In contrast to the controlled leaflet-based benchmark, answers in this dataset are not necessarily localized within a single document section. Instead, they frequently require synthesis across distributed knowledge and contextual interpretation. This property makes the benchmark suitable for analyzing potential retrieval-induced bias in complex reasoning tasks.
## Citation
This work was accepted at **The First Workshop on Language Technologies for Health (Lang4Health)** is a workshop dedicated to the development and application of Natural Language Processing (NLP) technologies in the healthcare field.
许可证:Apache-2.0
任务类别:
- 问答(question-answering)
- 零样本分类(zero-shot-classification)
- 文本生成(text-generation)
语言:葡萄牙语(pt)
标签:
- 大语言模型(LLM)
- 基准测试(benchmark)
- 葡萄牙语(portuguese)
- bode
展示名称:DrBodeBench
规模类别:样本量小于1000(n<1K)
# 基于DrBodeBench的药物导向临床基准测试集
## 数据集详情
为评估高阶推理场景下的检索能力,我们从葡萄牙语医疗基准测试集[DrBodeBench](https://huggingface.co/datasets/recogna-nlp/drbodebench)衍生构建了本基准测试集。该数据集整合了巴西医疗执业资格考试(包括Revalida考试与FUVEST直接准入住院医师考试)的试题。我们从原始DrBodeBench中精选出仅涉及药物相关主题的特定子集。
我们采用大语言模型(LLM)识别出以药物知识为诊断或治疗决策核心的考试题目,仅保留需要明确整合药理学知识的试题。最终生成的数据集包含具备临床场景的多项选择题场景,要求将药物知识与患者病史、实验室检查结果及临床推理相结合。
与基于受控药品说明书的基准测试集不同,本数据集的答案未必局限于单个文档段落,反而往往需要对分散的知识进行综合梳理并结合上下文进行解读。这一特性使得该基准测试集适用于分析复杂推理任务中潜在的检索诱导偏差。
## 引用信息
本研究成果已被首届健康语言技术研讨会(The First Workshop on Language Technologies for Health,Lang4Health)收录,该研讨会专注于自然语言处理(NLP)技术在医疗健康领域的开发与应用。
提供机构:
recogna-nlp



