Massive Multitask Agent Understanding (MMAU)

Name: Massive Multitask Agent Understanding (MMAU)
Creator: Apple
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/apple/axlearn/tree/main/docs/research/mmau

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个全面性的基准测试，名为MMAU，它评估模型在五个领域的表现，这些领域包括工具使用、有向无环图（DAG）问答、数据科学和机器学习编程、竞赛级别的编程以及数学。覆盖了五种基本能力：理解力、推理、规划、问题解决和自我纠正。MMAU精心设计了20个任务，包含超过3000个独特的提示。这项任务的目的是跨多种任务和能力对语言模型进行评估。

This dataset, named MMAU, is a comprehensive benchmark that evaluates model performance across five domains: tool use, directed acyclic graph (DAG) question answering, data science and machine learning programming, competitive-level programming, and mathematics. It encompasses five core capabilities: comprehension, reasoning, planning, problem-solving, and self-correction. MMAU has been meticulously designed with 20 tasks comprising over 3000 unique prompts. The core objective of this benchmark is to evaluate language models across diverse tasks and their corresponding capabilities.

提供机构：

Apple

5,000+

优质数据集

54 个

任务类型

进入经典数据集