GAIA-Subset-Benchmark

Name: GAIA-Subset-Benchmark
Creator: maas
Published: 2026-01-02 16:28:11
License: 暂无描述

魔搭社区2026-01-02 更新2025-07-05 收录

下载链接：

https://modelscope.cn/datasets/Intelligent-Internet/GAIA-Subset-Benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

# GAIA Benchmark Subset Model Card This dataset is a subset of the GAIA benchmark, containing 44 web-search-based questions from the validation set. It evaluates multiple AI models on their ability to retrieve and process real-time information using web search and browser tools. Performance metrics include success indicators and detailed reports for each model. A comparative chart summarizing the results will be provided separately. ## Benchmark Results ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633fe629f81b9d10135fefda/Icwl4FIZX-wIMux6md0Pc.png)

# GAIA基准测试子集模型卡片本数据集为GAIA基准测试的子集，包含验证集中的44个基于网页搜索的问题。其旨在评估多款AI模型利用网页搜索与浏览器工具检索并处理实时信息的能力。性能评估指标涵盖成功指标以及针对各模型的详细分析报告。用于汇总测试结果的对比图表将另行提供。 ## 基准测试结果 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633fe629f81b9d10135fefda/Icwl4FIZX-wIMux6md0Pc.png)

提供机构：

maas

创建时间：

2025-03-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集