songjhPKU/cc-arena-dataset

Name: songjhPKU/cc-arena-dataset
Creator: songjhPKU
Published: 2026-04-01 18:46:53
License: 暂无描述

Hugging Face2026-04-01 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/songjhPKU/cc-arena-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - code - benchmark - agent - coding-agent - cc-arena pretty_name: CC-Arena Benchmark Dataset size_categories: - 1K<n<10K --- # CC-Arena Benchmark Dataset Full benchmark datasets for [CC-Arena](https://github.com/songjhPKU/cc-arena) — a framework for evaluating AI coding agents (Claude Code, Cursor, etc.). ## Quick Start ### Via CC-Arena CLI (recommended) ```bash # Download a specific benchmark python3 -m cc_arena.tasks.downloader download humaneval # Download with limit python3 -m cc_arena.tasks.downloader download bigcodebench --limit 100 # List all available benchmarks python3 -m cc_arena.tasks.downloader list ``` ### Via huggingface_hub ```python from huggingface_hub import hf_hub_download path = hf_hub_download( repo_id="songjhPKU/cc-arena-dataset", filename="humaneval.jsonl", repo_type="dataset", ) ``` ### Via Direct URL ```bash wget https://huggingface.co/datasets/songjhPKU/cc-arena-dataset/resolve/main/humaneval.jsonl ``` ## Dataset Structure Each benchmark is a single JSONL file. One line = one task: | File | Tasks | Difficulty | Description | |------|-------|-----------|-------------| | `humaneval.jsonl` | 164 | Easy | OpenAI HumanEval function-level code generation | | `bigcodebench.jsonl` | 1140 | Hard | BigCodeBench-Hard multi-library tasks | | `naturalcodebench.jsonl` | ~280 | Medium | NaturalCodeBench Python + Java real-world tasks | | `devbench.jsonl` | ~22 | Hard | DevBench multi-stage software engineering projects | | `custom.jsonl` | 10+ | Mixed | CC-Arena custom tasks (replaycode + engineering) | | `swebench_lite.jsonl` | 5 | Medium | SWE-bench style bug-fixing tasks | | `builtin.jsonl` | 3 | Easy | Smoke tests for verifying CC-Arena works | ## JSONL Schema Each line is a JSON object with these fields: ```json { "task_id": "HumanEval_0", "prompt": "Implement the function `has_close_elements` in solution.py. ...", "initial_files": { "solution.py": "from typing import List\n\ndef has_close_elements(...): ...", "test_solution.py": "from solution import has_close_elements\n..." }, "test_type": "pytest", "test_command": "python3 -m pytest test_solution.py -v", "expected_output": null, "timeout_seconds": 120, "metadata": { "difficulty": "easy", "language": "python", "source": "HumanEval" } } ``` ### Field Reference | Field | Type | Description | |-------|------|-------------| | `task_id` | string | Unique task identifier | | `prompt` | string | Instructions given to the coding agent | | `initial_files` | dict | Files placed in workspace before agent runs (`{path: content}`) | | `test_type` | string | `pytest`, `stdout_contains`, `file_contains`, `file_exists`, `exit_code` | | `test_command` | string | Command to run tests | | `expected_output` | string/null | Expected output for non-pytest test types | | `timeout_seconds` | int | Task timeout | | `metadata` | dict | Additional info (difficulty, language, source, etc.) | ## How It Works 1. The CC-Arena downloader fetches a JSONL file from this dataset 2. For each row, it creates a task directory: ``` benchmarks/<benchmark>/tasks/<task_id>/ ├── task.yaml # Generated from metadata ├── solution.py # From initial_files ├── test_solution.py # From initial_files └── (other files) # From initial_files ``` 3. The agent receives the `prompt` and works in the task directory 4. Tests are run according to `test_type` and `test_command` ## License Apache 2.0 ## Citation ```bibtex @misc{cc-arena, title={CC-Arena: A Framework for Evaluating AI Coding Agents}, url={https://github.com/songjhPKU/cc-arena}, year={2026} } ```

提供机构：

songjhPKU

5,000+

优质数据集

54 个

任务类型

进入经典数据集