ZixuanKe/evovling_tools

Name: ZixuanKe/evovling_tools
Creator: ZixuanKe
Published: 2026-04-28 05:59:31
License: 暂无描述

Hugging Face2026-04-28 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/ZixuanKe/evovling_tools

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - other language: - en tags: - agents - tool-use - evolving-benchmark - enterprise-ops pretty_name: EnterpriseOps Evolving Tool Benchmark configs: - config_name: calendar_v1 data_files: - split: train path: calendar/v1/train.jsonl - split: test path: calendar/v1/test.jsonl - config_name: calendar_v2 data_files: - split: train path: calendar/v2/train.jsonl - split: test path: calendar/v2/test.jsonl - config_name: calendar_v3 data_files: - split: train path: calendar/v3/train.jsonl - split: test path: calendar/v3/test.jsonl - config_name: csm_v1 data_files: - split: train path: csm/v1/train.jsonl - split: test path: csm/v1/test.jsonl - config_name: csm_v2 data_files: - split: train path: csm/v2/train.jsonl - split: test path: csm/v2/test.jsonl - config_name: csm_v3 data_files: - split: train path: csm/v3/train.jsonl - split: test path: csm/v3/test.jsonl - config_name: csm_v4 data_files: - split: train path: csm/v4/train.jsonl - split: test path: csm/v4/test.jsonl - config_name: drive_v1 data_files: - split: train path: drive/v1/train.jsonl - split: test path: drive/v1/test.jsonl - config_name: drive_v2 data_files: - split: train path: drive/v2/train.jsonl - split: test path: drive/v2/test.jsonl - config_name: drive_v3 data_files: - split: train path: drive/v3/train.jsonl - split: test path: drive/v3/test.jsonl - config_name: email_v1 data_files: - split: train path: email/v1/train.jsonl - split: test path: email/v1/test.jsonl - config_name: email_v2 data_files: - split: train path: email/v2/train.jsonl - split: test path: email/v2/test.jsonl - config_name: email_v3 data_files: - split: train path: email/v3/train.jsonl - split: test path: email/v3/test.jsonl - config_name: email_v4 data_files: - split: train path: email/v4/train.jsonl - split: test path: email/v4/test.jsonl - config_name: email_v5 data_files: - split: train path: email/v5/train.jsonl - split: test path: email/v5/test.jsonl - config_name: email_v6 data_files: - split: train path: email/v6/train.jsonl - split: test path: email/v6/test.jsonl - config_name: hr_v1 data_files: - split: train path: hr/v1/train.jsonl - split: test path: hr/v1/test.jsonl - config_name: hr_v2 data_files: - split: train path: hr/v2/train.jsonl - split: test path: hr/v2/test.jsonl - config_name: hr_v3 data_files: - split: train path: hr/v3/train.jsonl - split: test path: hr/v3/test.jsonl - config_name: hr_v4 data_files: - split: train path: hr/v4/train.jsonl - split: test path: hr/v4/test.jsonl - config_name: hr_v5 data_files: - split: train path: hr/v5/train.jsonl - split: test path: hr/v5/test.jsonl - config_name: hybrid_v1 data_files: - split: train path: hybrid/v1/train.jsonl - split: test path: hybrid/v1/test.jsonl - config_name: hybrid_v2 data_files: - split: train path: hybrid/v2/train.jsonl - split: test path: hybrid/v2/test.jsonl - config_name: hybrid_v3 data_files: - split: train path: hybrid/v3/train.jsonl - split: test path: hybrid/v3/test.jsonl - config_name: hybrid_v4 data_files: - split: train path: hybrid/v4/train.jsonl - split: test path: hybrid/v4/test.jsonl - config_name: itsm_v1 data_files: - split: train path: itsm/v1/train.jsonl - split: test path: itsm/v1/test.jsonl - config_name: itsm_v2 data_files: - split: train path: itsm/v2/train.jsonl - split: test path: itsm/v2/test.jsonl - config_name: itsm_v3 data_files: - split: train path: itsm/v3/train.jsonl - split: test path: itsm/v3/test.jsonl - config_name: teams_v1 data_files: - split: train path: teams/v1/train.jsonl - split: test path: teams/v1/test.jsonl - config_name: teams_v2 data_files: - split: train path: teams/v2/train.jsonl - split: test path: teams/v2/test.jsonl - config_name: teams_v3 data_files: - split: train path: teams/v3/train.jsonl - split: test path: teams/v3/test.jsonl - config_name: teams_v4 data_files: - split: train path: teams/v4/train.jsonl - split: test path: teams/v4/test.jsonl --- # Evolving Tool Benchmark (EnterpriseOps-Gym v7) Each domain ships as a sequence of versions `V1, V2, ..., VK` that simulate a real-world tool universe growing over time: - **Tools accumulate**: `C_1 ⊆ C_2 ⊆ ... ⊆ C_K` — each version adds new tools on top of the previous one. - **Tasks are partitioned per stage** into `adapt` (used here as `train`, e.g. for in-context examples / fine-tuning) and `test` splits. - **Frequency-driven anchoring** uses real co-occurrence statistics so early versions contain the most popular tools. - The schedule is **adaptively** built to satisfy growth-rate and minimum task-count constraints. ## Layout Each domain (`calendar`, `csm`, `drive`, `email`, `hr`, `hybrid`, `itsm`, `teams`) has 3-6 versions. Each `(domain, version)` pair is a **config**, with `train` and `test` splits: ``` <repo>/ ├── calendar/ │ ├── v1/ │ │ ├── train.jsonl # adapt tasks at V1 │ │ └── test.jsonl # test tasks at V1 │ ├── v2/ │ └── v3/ ├── csm/ │ ├── v1/ ... v4/ ├── drive/ ├── email/ ├── hr/ ├── hybrid/ ├── itsm/ └── teams/ ``` ## Usage ```python from datasets import load_dataset # One config = one (domain, version) pair ds = load_dataset("ZixuanKe/evovling_tools", "calendar_v1") train_ds = ds["train"] test_ds = ds["test"] # Or load a single split directly: train_ds = load_dataset("ZixuanKe/evovling_tools", "calendar_v1", split="train") test_ds = load_dataset("ZixuanKe/evovling_tools", "csm_v3", split="test") ``` ## Row schema Every row contains the original task config plus metadata columns: | field | type | description | | --- | --- | --- | | `domain` | str | one of `calendar, csm, drive, email, hr, hybrid, itsm, teams` | | `version` | str | `v1`, `v2`, ... (1-indexed; matches the `V1, V2, ...` schedule in the source manifest) | | `split` | str | `train` (=adapt) or `test` | | `task_id` | str | original task id, stable across versions (use to join the same task at multiple stages) | | `oracle_tools` | list[str] | minimal ground-truth tool list from the source `selected_tools` field (order and any duplicates preserved as-is) | | `system_prompt` | str | system prompt for the agent | | `user_prompt` | str | user request the agent must satisfy | | `cummulative_tools` | list[str] | the **cumulative** tool universe `C_k` at the assigned stage (what the agent sees, includes distractors) | | `mcp_endpoint` | str | MCP HTTP endpoint, e.g. `/mcp` | | `gym_servers_config` | list[dict] | per-server MCP config (URL, seed DB, user info) | | `verifiers` | list[dict] | DB-state / API-state verifiers used to grade the agent | ## Evaluating an evolving agent [TODO]

提供机构：

ZixuanKe

5,000+

优质数据集

54 个

任务类型

进入经典数据集