Fujitsu-FRE/MAPS_Verified
收藏Hugging Face2025-06-03 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/Fujitsu-FRE/MAPS_Verified
下载链接
链接失效反馈官方服务:
资源简介:
这是一个首个多语言代理AI性能和安全评估基准,包含550个GAIA任务、660个ASB任务、737个MATH任务和1100个SWE任务,每个任务都翻译成10种目标语言,共有约3000个多语言任务。数据集通过混合机器生成和人工验证的流程创建,以确保在十种不同语言中的一致性和意图的忠实性。
This is the first Multilingual Agentic AI Benchmark for evaluating agentic AI systems across different languages and diverse tasks, containing 550 GAIA tasks, 660 ASB tasks, 737 MATH tasks, and 1100 SWE tasks, each translated into 10 target languages, totaling around 3K multilingual tasks. The dataset is created through a hybrid machine generation and human verification process to ensure consistency and intent fidelity across ten different languages.
提供机构:
Fujitsu-FRE



