GAIA
收藏魔搭社区2026-05-23 更新2025-03-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/GAIA
下载链接
链接失效反馈官方服务:
资源简介:
# GAIA dataset
GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc).
We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.
## Data and leaderboard
GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It is therefore divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities. Each level is divided into a fully public dev set for validation, and a test set with private answers and metadata.
GAIA leaderboard can be found in this space (https://huggingface.co/spaces/gaia-benchmark/leaderboard).
Questions are contained in metadata.jsonl. Some questions come with an additional file, that can be found in the same folder and whose id is given in the field file_name.
More details in [the paper](https://arxiv.org/abs/2311.12983) for now and soon here as well.
## Dataset Format update (October 2025)
To keep GAIA compatible with HF datasets 4.x where code-based dataset loaders are deprecated—we now ship Parquet-backed splits that mirror the former JSONL structure:
- `metadata.parquet` carries the full split, and companion files like `metadata.level1.parquet` retain the per-level views exposed in the configs.
- Columns remain `task_id`, `Question`, `Level`, `Final answer`, `file_name`, `file_path`, and the struct-valued `Annotator Metadata`, so existing processing pipelines can continue unchanged.
- `file_path` keeps pointing to attachments relative to the repository root (for example, `2023/test/<attachment-id>.pdf`), ensuring offline access to PDFs, media, and other auxiliary files.
### Load datasets
```python
import os
from datasets import load_dataset
from huggingface_hub import snapshot_download
data_dir = snapshot_download(repo_id="gaia-benchmark/GAIA", repo_type="dataset")
dataset = load_dataset(data_dir, "2023_level1", split="test")
for example in dataset:
question = example["Question"]
file_path = os.path.join(data_dir, example["file_path"])
```
# GAIA 数据集
GAIA是一款基准测试集,旨在评估下一代大语言模型(Large Language Model,LLM)——这类模型通过集成工具、优化提示词、接入搜索等方式获得了更强的能力。
我们增设了访问限制以防止爬虫工具抓取本数据集,请不要以可被爬取的格式重新分享验证集与测试集。
## 数据与排行榜
GAIA包含超过450道具备明确答案的非简易问题,解决这些问题需要不同程度的工具使用能力与自主决策能力。因此该数据集被划分为3个难度等级:等级1的问题可由性能优异的大语言模型破解,而等级3则代表模型能力需要实现大幅跃升。每个等级均包含完全公开的开发验证集,以及仅包含私有答案与元数据的测试集。
GAIA排行榜可在此空间查看:https://huggingface.co/spaces/gaia-benchmark/leaderboard。
所有问题均存储于metadata.jsonl文件中。部分问题附带额外文件,这些文件与数据集存于同一目录,其ID可通过字段file_name获取。
更多细节可参阅[当前论文](https://arxiv.org/abs/2311.12983),后续也将在此处补充相关内容。
## 数据集格式更新(2025年10月)
为使GAIA兼容已弃用基于代码的数据集加载器的Hugging Face Datasets 4.x版本,我们现在提供基于Parquet格式的数据集分片,其结构与原JSONL格式保持一致:
- `metadata.parquet` 包含完整的数据集分片,而类似`metadata.level1.parquet`的配套文件则保留了配置中定义的按等级划分的数据视图。
- 数据集字段仍保留`task_id`、`Question`、`Level`、`Final answer`、`file_name`、`file_path`以及结构体类型的`Annotator Metadata`,因此现有的数据处理流水线可无需修改即可继续使用。
- `file_path` 仍指向相对于仓库根目录的附件路径(例如`2023/test/<附件ID>.pdf`),确保用户可离线访问PDF、多媒体及其他辅助文件。
### 数据集加载
python
import os
from datasets import load_dataset
from huggingface_hub import snapshot_download
data_dir = snapshot_download(repo_id="gaia-benchmark/GAIA", repo_type="dataset")
dataset = load_dataset(data_dir, "2023_level1", split="test")
for example in dataset:
question = example["Question"]
file_path = os.path.join(data_dir, example["file_path"])
提供机构:
maas
创建时间:
2025-03-11



