five

KRAFTON/terminal-bench-2-leaderboard

收藏
Hugging Face2026-02-12 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/KRAFTON/terminal-bench-2-leaderboard
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # Terminal-Bench 2.0 Leaderboard Submissions This repository accepts leaderboard submissions for [Terminal-Bench 2.0](https://terminal-bench.org). ## How to Submit 1. [Fork this repository](https://huggingface.co/docs/hub/en/repositories-next-steps#duplicating-with-the-git-history-fork) 2. Create a new branch for your submission 3. Add your submission (a job or folder of jobs) under `submissions/terminal-bench/2.0/<agent>__<model(s)>/` 4. Open a Pull Request ## Submission Structure ```text submissions/ terminal-bench/ 2.0/ <agent>__<model>/ metadata.yaml # Required: agent and model info <job-folder>/ # One or more job directories config.json <trial-1>/result.json <trial-2>/result.json ... ``` ## Required: metadata.yaml Each submission must include a `metadata.yaml` file with the following fields: ```yaml agent_url: https://... # Required: link to agent repo/docs agent_display_name: "My Agent" # Required: display name for leaderboard agent_org_display_name: "Org" # Required: organization name models: # Required: list of models used - model_name: gpt-5 # Required: model identifier model_provider: openai # Required: provider (openai, anthropic, etc.) model_display_name: "GPT-5" # Required model_org_display_name: "OpenAI" # Required # - Other models if your agent used multiple ``` ## Job Directory Requirements Each job directory must contain all of the contents of your run. ### Validation Rules Your submission will be automatically validated. To pass: - `timeout_multiplier` must equal `1.0` - No agent timeout overrides (`override_timeout_sec`, `max_timeout_sec`) - No verifier timeout overrides - No resource overrides (`override_cpus`, `override_memory_mb`, `override_storage_mb`) - All trial directories must have valid `result.json` files - Trial directories must contain other artifacts from the run - Each task must be evaluated with a minimum of five trials. We recommend the `-k 5` flag for convenience. ## Submission Process 1. **Open PR**: When you open a Pull Request, our bot will automatically validate your submission 2. **Fix Issues**: If validation fails, the bot will comment with specific errors to fix 3. **Merge**: Once validation passes, a maintainer will review and merge your PR 4. **Import**: After merge, results are automatically imported to the leaderboard ## Questions? Open an issue in this repository or contact <alexgshaw64@gmail.com>.
提供机构:
KRAFTON
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作