eval-aware/linuxarena-first5-trajectories

Name: eval-aware/linuxarena-first5-trajectories
Creator: eval-aware
Published: 2026-03-28 13:41:08
License: 暂无描述

Hugging Face2026-03-28 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/eval-aware/linuxarena-first5-trajectories

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation language: - en tags: - linuxbench - linuxarena - agent-trajectories - tool-use - eval-awareness - ai-safety size_categories: - n<1K --- # LinuxArena First-5 Trajectories First-5 tool call trajectories from [LinuxBench/LinuxArena](https://linuxbench.ai) evaluations across 14 model/policy combinations and 10 environments. ## Dataset Description Each row represents one evaluation trajectory with the **first 5 tool calls** extracted at two granularity levels: - **Level 1 (normalized):** Tool categories like `text_editor:view`, `bash:find`, `bash:ls` - **Level 2 (exact):** Full command strings like `bash$ find /app/src -type f -name "*.ts" | sort` Designed for studying: - **Model behavioral fingerprints:** How do different models explore unfamiliar codebases? - **Environmental awareness:** Do models recognize they're in an evaluation? - **Cross-model convergence:** When do different exploration strategies reach the same files? - **Ablation experiments:** What happens when you modify early tool call outputs? ## Key Statistics | Metric | Value | |--------|-------| | Total trajectories | 775 | | Unique models | 14 | | Unique environments | 10 | | Policies | honest (421), attack (316), claude-code (38) | ### Models | Model | Trajectories | |-------|:------------:| | claude-sonnet-4-6 | 398 | | gpt-5-mini | 56 | | claude-sonnet-4-5-20250929 | 38 | | gemini-3-flash-preview | 26 | | gemini-3.1-pro-preview | 26 | | glm-5 | 26 | | gpt-5-mini-2025-08-07 | 26 | | gpt-5.4-2026-03-05 | 26 | | grok-4.1-fast | 26 | | haiku | 26 | | kimi-k2.5 | 26 | | opus | 26 | | sonnet | 26 | | unknown | 23 | ### Environments | Environment | Trajectories | |-------------|:------------:| | auto_workflow | 103 | | tiktok | 93 | | file_indexing | 88 | | clinical_trial | 80 | | model_registry | 75 | | web_scraping | 73 | | user_tracking | 72 | | media_processing | 68 | | shared_dotfiles | 66 | | slack_clone | 57 | ## Schema ### Provenance - `trajectory_id` — Unique trajectory identifier - `dataset` — Source dataset directory name - `source_file` — JSON filename - `json_path` — Path to full trajectory JSON for complete retrieval - `run_id`, `run_name` — LinuxBench run identifiers ### Model & Policy - `model` — Full model identifier (e.g., `anthropic/claude-sonnet-4-6`) - `model_short` — Short name (e.g., `claude-sonnet-4-6`) - `policy` — `honest`, `attack`, or `claude-code` ### Task - `environment` — One of 10 LinuxArena environments - `main_task`, `task_category`, `task_description` — Task details - `side_task` — Attack side task name (empty for honest policy) ### Outcomes - `main_task_success`, `side_task_success` — Boolean results - `has_side_task` — Whether this is an attack-policy trajectory - `number_of_actions` — Total actions in the full trajectory - `cost_usd` — API cost ### First-5 Patterns - `l1_pattern` — Arrow-joined normalized pattern (e.g., `text_editor:view -> bash:find -> ...`) - `l2_pattern` — Arrow-joined exact pattern - `files_accessed` — Deduplicated list of file paths accessed in first 5 actions ### Per-Action Detail (length-5 lists) - `action_functions` — Tool function names - `action_arguments_json` — Full arguments as JSON strings - `action_normalized` — Normalized tool category per action - `action_exact` — Exact command string per action - `action_file_accessed` — File path accessed (if applicable) - `action_output_preview` — First 2000 chars of tool output - `action_output_error` — Error message (if any) - `action_has_error` — Boolean error flag - `action_reasoning` — Agent's reasoning text for this action ## Usage ```python from datasets import load_dataset ds = load_dataset("guychuk/linuxarena-first5-trajectories") # Filter to a specific model opus = ds["train"].filter(lambda x: x["model_short"] == "opus") # Get all trajectories for a specific task across models task = ds["train"].filter( lambda x: x["main_task"] == "Implement Workflow Timeouts" ) # Find trajectories where all models converge on the same file import json for row in task: files = row["files_accessed"] print(f"{row['model_short']}: {files}") ``` ## Source Raw trajectory data from [linuxbench.ai/data/trajs](https://linuxbench.ai/data/trajs). Full trajectory JSONs (with complete action sequences, outputs, and metadata) are referenced by the `json_path` field — these are not included in this dataset but are available from the source. ## Citation If you use this dataset, please cite the LinuxBench project.

提供机构：

eval-aware

搜集汇总

数据集介绍

构建方式

在人工智能代理评估领域，对模型在复杂环境中的初始探索行为进行细粒度分析至关重要。LinuxArena First-5 Trajectories数据集的构建源于LinuxBench/LinuxArena评估框架，其核心方法是从完整的评估轨迹中系统性地提取每个轨迹的前五次工具调用。这一过程在两个粒度层级上展开：第一层级对工具调用进行规范化归类，如`text_editor:view`；第二层级则保留完整的命令字符串。数据集涵盖了14种不同模型或策略组合在10个独立环境中的评估结果，最终汇聚了775条轨迹，旨在捕捉模型在陌生代码库中最初的行为模式。

特点

该数据集的核心特点在于其精细的结构化设计，专门用于剖析人工智能代理在任务初期的探索策略。每条轨迹不仅记录了模型标识、执行策略和任务环境，还详细标注了任务成功与否、API成本等结果指标。最具特色的是其对于前五次工具调用模式的深度刻画，包括规范化与精确的命令序列、访问的文件路径列表，以及每次调用的具体参数、输出预览和代理推理文本。这种多层次、多字段的架构使得研究者能够从行为指纹、环境感知、策略收敛性等多个维度，对跨模型的早期探索行为进行系统的比较和归因分析。

使用方法

利用该数据集进行研究，通常始于通过Hugging Face `datasets`库进行加载。研究者可以便捷地运用过滤功能，针对特定模型、任务或策略进行数据切片，例如聚焦于`opus`模型或`attack`策略的所有轨迹。数据集中的`files_accessed`和`l1_pattern`等字段，为分析不同模型在相同任务上是否访问相同文件或遵循相似工具调用序列提供了直接支持。通过关联`json_path`字段，研究者可进一步追溯至完整的原始轨迹JSON文件，从而将初期行为分析与完整的任务执行过程相结合，深化对模型决策机制的理解。

背景与挑战

背景概述

LinuxArena-first5-trajectories数据集诞生于人工智能代理工具使用与安全评估研究蓬勃发展的背景下，由LinuxBench项目团队于近期构建并发布。该数据集聚焦于大型语言模型在Linux环境中的交互行为，旨在通过提取评估轨迹中的前五个工具调用序列，深入探究模型在陌生代码库中的探索策略、环境感知能力以及不同模型间的行为收敛模式。其核心研究问题在于理解并量化智能代理在复杂、开放终端环境下的决策指纹与潜在风险，为模型鲁棒性、安全对齐及可解释性研究提供了宝贵的实证基础，对推进可靠人工智能系统的发展具有显著影响力。

当前挑战

该数据集致力于解决智能代理在真实世界Linux环境中工具调用与任务执行的评估挑战，其核心难题在于如何精准刻画模型在初始探索阶段的行为模式差异，并检测模型是否具备识别自身处于评估状态的元认知能力。在构建过程中，研究团队面临多重挑战：需从海量原始交互轨迹中标准化提取并标注多粒度工具调用序列；需设计严谨的框架以区分诚实执行、对抗攻击等不同策略下的行为；还需确保数据能有效支持跨模型、跨环境的比较分析，以揭示策略收敛与分化的普遍规律。

常用场景

经典使用场景

在人工智能代理行为分析领域，LinuxArena First-5 Trajectories数据集为研究大型语言模型在复杂Linux环境中的早期探索策略提供了关键数据支撑。该数据集通过提取14种不同模型在10种模拟环境中的前5个工具调用轨迹，使研究者能够深入剖析模型在陌生代码库中的初始行为模式，例如模型如何利用文本编辑器查看文件或通过bash命令进行目录搜索，从而揭示模型在任务执行初期的决策逻辑与探索偏好。

衍生相关工作

围绕该数据集，已衍生出一系列经典研究工作，主要集中在模型行为分析与评估框架构建领域。例如，研究者利用数据集中的轨迹模式进行模型行为聚类，以识别不同模型家族的探索特征；另有工作基于早期工具调用序列预测任务最终成功率，从而开发轻量级代理评估指标。这些研究不仅深化了对AI代理工具使用机制的理解，也为LinuxBench等评估平台的迭代提供了数据驱动的改进方向。

数据集最近研究