mukunda1729/agent-budget-violations

Name: mukunda1729/agent-budget-violations
Creator: mukunda1729
Published: 2026-04-27 16:20:10
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/mukunda1729/agent-budget-violations

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit language: - en tags: - agents - llm - observability - cost - testing - budgets size_categories: - n<1K configs: - config_name: default data_files: - split: train path: data.jsonl --- # agent-budget-violations 15 synthetic agent runs annotated with their **budget** (cost / tool-call / wall-time caps), **actual usage**, **violation types**, and a one-line **root cause + fix**. Built as fixtures for budget-enforcement tests, alerting heuristics, and observability dashboards. 5 of the 15 are clean (no violations) so you can test the "no false positive" path. ## Violation breakdown | Violation type | Count | |---|---| | `cost` | 4 | | `tool_calls` | 6 | | `wall_time` | 4 | | **None (clean)** | 5 | (Some runs violate multiple budgets — totals don't sum.) ## Schema ```jsonc { "id": "string", "agent": "string", "budget": { "max_tool_calls": 10, "max_cost_usd": 1.00, "max_wall_seconds": 60 }, "actual": { "tool_calls": 47, "cost_usd": 4.32, "wall_seconds": 312 }, "violation_types": ["tool_calls", "cost", "wall_time"], "root_cause": "string | null", "fix": "string | null" } ``` ## Common root causes covered - Infinite loops on tool errors - Slow third-party APIs - Model fallback to expensive tier - Off-by-one budget checks - Recursive task misinterpretation - LLM provider rate limits - Clarifying-question loops - Pagination explosion ## Quickstart ```python from datasets import load_dataset ds = load_dataset("mukunda1729/agent-budget-violations", split="train") multi_violators = [r for r in ds if len(r["violation_types"]) >= 2] print(f"{len(multi_violators)} multi-budget violations") ``` ## Related - [The Agent Reliability Stack](https://mukundakatta.github.io/agent-stack/) ## License MIT.

提供机构：

mukunda1729

5,000+

优质数据集

54 个

任务类型

进入经典数据集