GAIA-Subset-Benchmark
收藏魔搭社区2026-01-02 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/Intelligent-Internet/GAIA-Subset-Benchmark
下载链接
链接失效反馈官方服务:
资源简介:
# GAIA Benchmark Subset Model Card
This dataset is a subset of the GAIA benchmark, containing 44 web-search-based questions from the validation set. It evaluates multiple AI models on their ability to retrieve and process real-time information using web search and browser tools. Performance metrics include success indicators and detailed reports for each model. A comparative chart summarizing the results will be provided separately.
## Benchmark Results

# GAIA基准测试子集模型卡片
本数据集为GAIA基准测试的子集,包含验证集中的44个基于网页搜索的问题。其旨在评估多款AI模型利用网页搜索与浏览器工具检索并处理实时信息的能力。性能评估指标涵盖成功指标以及针对各模型的详细分析报告。用于汇总测试结果的对比图表将另行提供。
## 基准测试结果

提供机构:
maas
创建时间:
2025-03-31



