five

CnakeCharmer/CnakeCharmer

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/CnakeCharmer/CnakeCharmer
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: CnakeCharmer Trace Dataset license: mit language: - en task_categories: - text-generation size_categories: - 100K<n<1M configs: - config_name: raw data_files: - split: train path: raw/raw_traces.jsonl - config_name: parallel data_files: - split: train path: parallel/parallel_examples.jsonl --- # CnakeCharmer This dataset is for training an agent to translate python code to cython code, utilizing sandboxed tools for automated testing and optimization. ## Dataset Splits ### Parallel Parallel python / cython implementations sourced from our [github repo](https://github.com/dleemiller/CnakeCharmer). We have implemented, compiled, linted, benchmarked and tested over 700 examples. We use this dataset for collecting agent raw agent traces and GEPA optimizing prompt instructions. ### Raw Unfiltered trace logs directly from collection runs using a variety of models. Most traces only use the `evaluate_cython` tool, but a small number of traces also used some experimental `wiki_read` tools. A small number of these examples are included in the training data, as a natural way of adding some instructional material. The wiki pages are meant to address real challenges LLMs exhibited while translating code to cython, that we observed in the traces. ### SFT (planned) Filtered and formatted data using harmony format for training `gpt-oss`. ### GRPO (planned) Unpaired python scripts for multi-step agent training. ## Data Format Files are stored as JSONL. Each line is one record. ## Tools And Sandbox The trace and pair evaluation flow is built around the `evaluate_cython` tool: - Compiles Cython code - Runs correctness checks against Python reference behavior - Runs benchmark comparisons (Python vs Cython) to compute speedup - Produces optimization/quality metrics used for filtering and training Execution is sandboxed for safety and stability: - Runtime code execution happens in a bubblewrap (`bwrap`) sandbox with resource limits - Benchmark/correctness runs are isolated in subprocesses - This keeps untrusted generated code from affecting host system state ## Source Project - Project: CnakeCharmer - Focus: Cython optimization via tool-using agent traces - Repository: https://github.com/dleemiller/CnakeCharmer
提供机构:
CnakeCharmer
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作