NLP-FBK/multilingual-medical-reasoning-traces
收藏Hugging Face2025-11-27 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/NLP-FBK/multilingual-medical-reasoning-traces
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: en
features:
- name: id
dtype: string
- name: options
struct:
- name: '1'
dtype: string
- name: '2'
dtype: string
- name: '3'
dtype: string
- name: '4'
dtype: string
- name: correct_option
dtype: int64
- name: full_question
dtype: string
- name: similar_chunks_dense
list:
- name: chunk_id
dtype: int64
- name: similarity_score
dtype: float64
- name: text
dtype: string
- name: formatted_similar_chunks_dense
dtype: string
- name: reasoning
dtype: string
- name: list_of_options
sequence: string
- name: reasoning_parsed_answer
dtype: int64
splits:
- name: medmcqa
num_bytes: 6010411438
num_examples: 169098
- name: medqa
num_bytes: 341749007
num_examples: 9520
download_size: 2961550979
dataset_size: 6352160445
- config_name: es
features:
- name: id
dtype: string
- name: options
struct:
- name: '1'
dtype: string
- name: '2'
dtype: string
- name: '3'
dtype: string
- name: '4'
dtype: string
- name: correct_option
dtype: int64
- name: full_question
dtype: string
- name: similar_chunks_dense
list:
- name: chunk_id
dtype: int64
- name: similarity_score
dtype: float64
- name: text
dtype: string
- name: formatted_similar_chunks_dense
dtype: string
- name: reasoning
dtype: string
- name: list_of_options
sequence: string
- name: reasoning_parsed_answer
dtype: int64
splits:
- name: medmcqa
num_bytes: 4841465675
num_examples: 168771
- name: medqa
num_bytes: 298177516
num_examples: 9584
download_size: 2671932858
dataset_size: 5139643191
- config_name: it
features:
- name: id
dtype: string
- name: options
struct:
- name: '1'
dtype: string
- name: '2'
dtype: string
- name: '3'
dtype: string
- name: '4'
dtype: string
- name: correct_option
dtype: int64
- name: full_question
dtype: string
- name: similar_chunks_dense
list:
- name: chunk_id
dtype: int64
- name: similarity_score
dtype: float64
- name: text
dtype: string
- name: formatted_similar_chunks_dense
dtype: string
- name: reasoning
dtype: string
- name: list_of_options
sequence: string
- name: reasoning_parsed_answer
dtype: int64
splits:
- name: medmcqa
num_bytes: 4736599523
num_examples: 166257
- name: medqa
num_bytes: 289611823
num_examples: 9468
download_size: 2687166328
dataset_size: 5026211346
configs:
- config_name: en
data_files:
- split: medmcqa
path: en/medmcqa-*
- split: medqa
path: en/medqa-*
- config_name: es
data_files:
- split: medmcqa
path: es/medmcqa-*
- split: medqa
path: es/medqa-*
- config_name: it
data_files:
- split: medmcqa
path: it/medmcqa-*
- split: medqa
path: it/medqa-*
---
This datasets containes the traces generated to answer multiple-choice medical questions in Italian, Englihs, and Spanish.
The dataset is structured in 3 parts, one per language. Each part is composed by 2 splits, one containing the examples generated from `medqa`, one from `medmcqa`.
The columns are:
- `id`, representing an unique identifier
- `full_question`, representing the medical question
- `options`, a dictionary of options to answer the question and their identifiers
- `list_of_options`, a list of the `options` without the identifier
- `correct_option`, the identifier of the correct option to answer the `full_question`
- `similar_chunks_dense`, a list of chunks retireved from the Wikipedia knowledge base that are relevant to answer the `full_question`
- `formatted_similar_chunks_dense`, a refined version of the `similar_chunks_dense` that is actually used to help models answering the question
- `reasoning`, the answer to the `full_question` generated prompting Qwen3-32B, giving as context `formatted_similar_chunks_dense`
提供机构:
NLP-FBK



