collinear-ai/coding_samples
收藏Hugging Face2025-10-15 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/collinear-ai/coding_samples
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个编码样本数据集,包含多个子集,用于不同的数据应用场景。其中包括监督微调(SFT)、强化学习与验证器(RLVR)、代理基准测试和验证器基准测试。每个子集都有特定的数据结构,包含问题描述、测试用例、正确解决方案、编程语言等信息。代理基准测试还包括仓库信息、基准分支和提交详情、修补程序和测试等。验证器基准测试包含成对的数据和一个指示哪个响应更正确的标签。
The dataset is a collection of coding samples, which includes multiple subsets for different data use cases. These subsets are Supervised Fine-Tuning (SFT), Reinforcement Learning with Verifiers (RLVR), Agent Benchmarking, and Verifier Benchmarking. Each subset has a specific data structure that includes fields like problem description, test cases, correct solutions, programming language, etc. The Agent Benchmarking subset also includes repository information, baseline branch and commit details, patches, and tests. The Verifier Benchmarking subset contains pairs of data with a label indicating which response is more correct.
提供机构:
collinear-ai



