five

Alvorada-bench

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/HenriqueGodoy/Alvorada-bench
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains 4,515 multiple-choice questions from five major Brazilian university entrance exams (ENEM, FUVEST, UNICAMP, ITA, IME) spanning 32 years (1981-2025), along with model responses from 20 LLMs. ## Files ### 📄 `questions_data.csv` (4,515 rows) Contains the exam questions with: - `question_id`: Unique identifier - `question_statement`: Question text in Portuguese - `correct_answer`: Correct option (A-E) - `alternative_a` to `alternative_e`: Answer choices - `subject`: Academic subject - `exam_name`, `exam_year`, `exam_type`: Exam metadata ### 📄 `responses_data.csv` Contains model responses with: - `model`: Model name (o3, deepseek-reasoner, claude-opus-4-20250514) - `prompt_template`: Prompting strategy used (zero-shot, role-playing, chain-of-thought) - `chosen_answer`: Model's selected answer - `is_correct`: Whether the answer was correct - `difficulty_level`, `uncertainty_level`: Model's self-reported metrics (1-10 scale) - `bloom_taxonomy`: Cognitive complexity classification - Additional metadata matching questions_data ## Cite ``` @misc{godoy2025alvoradabenchlanguagemodelssolve, title={Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?}, author={Henrique Godoy}, year={2025}, eprint={2508.15835}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.15835}, } ```

本数据集收录了1981年至2025年共32年间,巴西五大主流高考(ENEM、FUVEST、UNICAMP、ITA、IME)的4515道多项选择题,同时包含20个大语言模型(LLM)的作答结果。 ## 数据文件 ### 📄 `questions_data.csv`(共4515行) 该文件包含试题相关信息,字段如下: - `question_id`:唯一标识符 - `question_statement`:葡萄牙语试题题干 - `correct_answer`:正确选项(A-E) - `alternative_a`至`alternative_e`:各备选项 - `subject`:所属学科 - `exam_name`、`exam_year`、`exam_type`:考试元数据 ### 📄 `responses_data.csv` 该文件包含模型作答结果,字段如下: - `model`:模型名称(如o3、deepseek-reasoner、claude-opus-4-20250514) - `prompt_template`:所采用的提示策略(零样本(zero-shot)、角色扮演、思维链(chain-of-thought)) - `chosen_answer`:模型选定的答案 - `is_correct`:作答是否正确 - `difficulty_level`、`uncertainty_level`:模型自行报告的评测指标(采用1-10分制) - `bloom_taxonomy`:布卢姆认知目标分类体系 - 其余元数据与`questions_data.csv`保持一致 ## 引用 @misc{godoy2025alvoradabenchlanguagemodelssolve, title={Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?}, author={Henrique Godoy}, year={2025}, eprint={2508.15835}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.15835}, }
提供机构:
maas
创建时间:
2025-10-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作