Alvorada-bench
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/HenriqueGodoy/Alvorada-bench
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 4,515 multiple-choice questions from five major Brazilian university entrance exams (ENEM, FUVEST, UNICAMP, ITA, IME) spanning 32 years (1981-2025), along with model responses from 20 LLMs.
## Files
### 📄 `questions_data.csv` (4,515 rows)
Contains the exam questions with:
- `question_id`: Unique identifier
- `question_statement`: Question text in Portuguese
- `correct_answer`: Correct option (A-E)
- `alternative_a` to `alternative_e`: Answer choices
- `subject`: Academic subject
- `exam_name`, `exam_year`, `exam_type`: Exam metadata
### 📄 `responses_data.csv`
Contains model responses with:
- `model`: Model name (o3, deepseek-reasoner, claude-opus-4-20250514)
- `prompt_template`: Prompting strategy used (zero-shot, role-playing, chain-of-thought)
- `chosen_answer`: Model's selected answer
- `is_correct`: Whether the answer was correct
- `difficulty_level`, `uncertainty_level`: Model's self-reported metrics (1-10 scale)
- `bloom_taxonomy`: Cognitive complexity classification
- Additional metadata matching questions_data
## Cite
```
@misc{godoy2025alvoradabenchlanguagemodelssolve,
title={Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?},
author={Henrique Godoy},
year={2025},
eprint={2508.15835},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.15835},
}
```
本数据集收录了1981年至2025年共32年间,巴西五大主流高考(ENEM、FUVEST、UNICAMP、ITA、IME)的4515道多项选择题,同时包含20个大语言模型(LLM)的作答结果。
## 数据文件
### 📄 `questions_data.csv`(共4515行)
该文件包含试题相关信息,字段如下:
- `question_id`:唯一标识符
- `question_statement`:葡萄牙语试题题干
- `correct_answer`:正确选项(A-E)
- `alternative_a`至`alternative_e`:各备选项
- `subject`:所属学科
- `exam_name`、`exam_year`、`exam_type`:考试元数据
### 📄 `responses_data.csv`
该文件包含模型作答结果,字段如下:
- `model`:模型名称(如o3、deepseek-reasoner、claude-opus-4-20250514)
- `prompt_template`:所采用的提示策略(零样本(zero-shot)、角色扮演、思维链(chain-of-thought))
- `chosen_answer`:模型选定的答案
- `is_correct`:作答是否正确
- `difficulty_level`、`uncertainty_level`:模型自行报告的评测指标(采用1-10分制)
- `bloom_taxonomy`:布卢姆认知目标分类体系
- 其余元数据与`questions_data.csv`保持一致
## 引用
@misc{godoy2025alvoradabenchlanguagemodelssolve,
title={Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?},
author={Henrique Godoy},
year={2025},
eprint={2508.15835},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.15835},
}
提供机构:
maas
创建时间:
2025-10-09



