songjhPKU/cc-arena-dataset
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/songjhPKU/cc-arena-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- code
- benchmark
- agent
- coding-agent
- cc-arena
pretty_name: CC-Arena Benchmark Dataset
size_categories:
- 1K<n<10K
---
# CC-Arena Benchmark Dataset
Full benchmark datasets for [CC-Arena](https://github.com/songjhPKU/cc-arena) — a framework for evaluating AI coding agents (Claude Code, Cursor, etc.).
## Quick Start
### Via CC-Arena CLI (recommended)
```bash
# Download a specific benchmark
python3 -m cc_arena.tasks.downloader download humaneval
# Download with limit
python3 -m cc_arena.tasks.downloader download bigcodebench --limit 100
# List all available benchmarks
python3 -m cc_arena.tasks.downloader list
```
### Via huggingface_hub
```python
from huggingface_hub import hf_hub_download
path = hf_hub_download(
repo_id="songjhPKU/cc-arena-dataset",
filename="humaneval.jsonl",
repo_type="dataset",
)
```
### Via Direct URL
```bash
wget https://huggingface.co/datasets/songjhPKU/cc-arena-dataset/resolve/main/humaneval.jsonl
```
## Dataset Structure
Each benchmark is a single JSONL file. One line = one task:
| File | Tasks | Difficulty | Description |
|------|-------|-----------|-------------|
| `humaneval.jsonl` | 164 | Easy | OpenAI HumanEval function-level code generation |
| `bigcodebench.jsonl` | 1140 | Hard | BigCodeBench-Hard multi-library tasks |
| `naturalcodebench.jsonl` | ~280 | Medium | NaturalCodeBench Python + Java real-world tasks |
| `devbench.jsonl` | ~22 | Hard | DevBench multi-stage software engineering projects |
| `custom.jsonl` | 10+ | Mixed | CC-Arena custom tasks (replaycode + engineering) |
| `swebench_lite.jsonl` | 5 | Medium | SWE-bench style bug-fixing tasks |
| `builtin.jsonl` | 3 | Easy | Smoke tests for verifying CC-Arena works |
## JSONL Schema
Each line is a JSON object with these fields:
```json
{
"task_id": "HumanEval_0",
"prompt": "Implement the function `has_close_elements` in solution.py. ...",
"initial_files": {
"solution.py": "from typing import List\n\ndef has_close_elements(...): ...",
"test_solution.py": "from solution import has_close_elements\n..."
},
"test_type": "pytest",
"test_command": "python3 -m pytest test_solution.py -v",
"expected_output": null,
"timeout_seconds": 120,
"metadata": {
"difficulty": "easy",
"language": "python",
"source": "HumanEval"
}
}
```
### Field Reference
| Field | Type | Description |
|-------|------|-------------|
| `task_id` | string | Unique task identifier |
| `prompt` | string | Instructions given to the coding agent |
| `initial_files` | dict | Files placed in workspace before agent runs (`{path: content}`) |
| `test_type` | string | `pytest`, `stdout_contains`, `file_contains`, `file_exists`, `exit_code` |
| `test_command` | string | Command to run tests |
| `expected_output` | string/null | Expected output for non-pytest test types |
| `timeout_seconds` | int | Task timeout |
| `metadata` | dict | Additional info (difficulty, language, source, etc.) |
## How It Works
1. The CC-Arena downloader fetches a JSONL file from this dataset
2. For each row, it creates a task directory:
```
benchmarks/<benchmark>/tasks/<task_id>/
├── task.yaml # Generated from metadata
├── solution.py # From initial_files
├── test_solution.py # From initial_files
└── (other files) # From initial_files
```
3. The agent receives the `prompt` and works in the task directory
4. Tests are run according to `test_type` and `test_command`
## License
Apache 2.0
## Citation
```bibtex
@misc{cc-arena,
title={CC-Arena: A Framework for Evaluating AI Coding Agents},
url={https://github.com/songjhPKU/cc-arena},
year={2026}
}
```
提供机构:
songjhPKU



