five

Massive Multitask Agent Understanding (MMAU)

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/apple/axlearn/tree/main/docs/research/mmau
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个全面性的基准测试,名为MMAU,它评估模型在五个领域的表现,这些领域包括工具使用、有向无环图(DAG)问答、数据科学和机器学习编程、竞赛级别的编程以及数学。覆盖了五种基本能力:理解力、推理、规划、问题解决和自我纠正。MMAU精心设计了20个任务,包含超过3000个独特的提示。这项任务的目的是跨多种任务和能力对语言模型进行评估。

This dataset, named MMAU, is a comprehensive benchmark that evaluates model performance across five domains: tool use, directed acyclic graph (DAG) question answering, data science and machine learning programming, competitive-level programming, and mathematics. It encompasses five core capabilities: comprehension, reasoning, planning, problem-solving, and self-correction. MMAU has been meticulously designed with 20 tasks comprising over 3000 unique prompts. The core objective of this benchmark is to evaluate language models across diverse tasks and their corresponding capabilities.
提供机构:
Apple
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作