nemotron-gym-competitive-coding

Name: nemotron-gym-competitive-coding
Creator: LAION eV
Published: 2026-05-16 23:25:49
License: 暂无描述

Hugging Face2026-05-16 更新2026-05-17 收录

下载链接：

https://huggingface.co/datasets/laion/nemotron-gym-competitive-coding

下载链接

链接失效反馈

官方服务：

资源简介：

laion/nemotron-gym-competitive-coding数据集是nvidia/Nemotron-RL-coding-competitive_coding数据集的Harbor格式转换版本，专为强化学习环境设计，特别适用于具有可验证奖励的编程竞赛任务。数据集采用Harbor任务布局，每个样本包含两个字段：path字段为确定性短ID字符串（格式为<family>-<sha256[:12]>.tar.gz），task_binary字段为包含完整Harbor任务的gzip压缩tar二进制数据。解压后的任务包含多个文件：instruction.md（展示给智能体的提示）、environment/Dockerfile（基于python:3.11-slim-bookworm镜像的任务特定Python环境）、tests/test.sh（验证器入口点）、tests/verifier.py（确定性验证器实现）、tests/verifier_data.json（每个任务的验证器输入数据，JSON格式，无代码插值）、metadata.json（来源信息，包括源数据集、行索引、家族等）和task.toml（标准Harbor任务配置，包含CPU/内存/超时默认值）。验证器采用stdio_diff家族，通过运行/app/solution.py并对比隐藏的stdin/stdout测试用例来评估解决方案。数据转换过程强调安全性，确保数据集内容不会插值到shell、Python或Dockerfile源代码中，所有值通过JSON文件传递，基础镜像固定，pip规范通过严格允许列表正则表达式验证，文本字段去除控制字符，长度有限制，tarball路径经过验证防止路径遍历攻击，且tarball具有确定性（排序条目、固定时间戳和用户/组ID）。数据集规模在10K到100K样本之间，语言为英语，适用于强化学习研究和编程竞赛环境中的智能体训练与评估。

The laion/nemotron-gym-competitive-coding dataset is a Harbor-formatted converted version of the nvidia/Nemotron-RL-coding-competitive_coding dataset, specifically tailored for reinforcement learning environments, and particularly suited for programming contest tasks with verifiable rewards. The dataset follows the Harbor task layout, with each sample containing two fields: the `path` field is a deterministic short ID string formatted as `<family>-<sha256[:12]>.tar.gz`, and the `task_binary` field is gzip-compressed tar binary data containing the complete Harbor task. The extracted task package includes multiple files: `instruction.md` (the prompt presented to the AI Agent), `environment/Dockerfile` (a task-specific Python environment based on the python:3.11-slim-bookworm base image), `tests/test.sh` (the validator entry point), `tests/verifier.py` (the deterministic validator implementation), `tests/verifier_data.json` (validator input data for each task, in JSON format with no code interpolation), `metadata.json` (source information including the source dataset, row index, family, etc.), and `task.toml` (standard Harbor task configuration containing default CPU, memory, and timeout values). The validator adopts the stdio_diff family, evaluating solutions by executing `/app/solution.py` and comparing outputs against hidden stdin/stdout test cases. The data conversion process prioritizes security, ensuring that dataset content will not be interpolated into shell, Python, or Dockerfile source code. All values are transferred via JSON files, the base image is fixed, pip specifications are validated against a strict allowlist regular expression, control characters are removed from text fields with length restrictions enforced, tarball paths are verified to prevent path traversal attacks, and the tarball is deterministic (sorted entries, fixed timestamps and user/group IDs). The dataset has a scale between 10K and 100K samples, is in English, and is suitable for reinforcement learning research as well as agent training and evaluation in programming contest environments.

提供机构：

LAION eV

创建时间：

2026-05-16

原始信息汇总

数据集概述

数据集名称：laion/nemotron-gym-competitive-coding

许可证：CC-BY-4.0

任务类别：强化学习（reinforcement-learning）

语言：英语（en）

数据集规模：10K < n < 100K 条样本

标签：harbor、nemotron-gym、rl、verifiable-rewards

数据来源与转换

该数据集是 nvidia/Nemotron-RL-coding-competitive_coding 的 Harbor 格式转换版本。
转换工具来源于 OpenThoughts-Agent 项目中的 data/nemotron_gym 适配器。
转换过程采用 安全构建 方式：
- 数据内容不会插入到 shell、Python 或 Dockerfile 源代码中，所有值通过 tests/verifier_data.json（JSON 格式，运行时解析）传递。
- 基础镜像使用固定名称的 python:3.11-slim-bookworm；pip 依赖项经过严格的正则表达式白名单验证。
- 文本字段经过 C0/C1 控制字符剥离、长度截断、tarball 路径验证（防止路径遍历、空字节、绝对路径攻击）。
- tarball 文件具有确定性（排序条目、mtime=0、uid/gid=0），确保字节可复现。

数据列说明

列名	类型	描述
`path`	string	确定性短 ID，格式为 `<family>-<sha256[:12]>.tar.gz`
`task_binary`	binary	包含完整 Harbor 任务的 gzip 压缩 tar 包

Harbor 任务布局

每个 task_binary 解压后的目录结构如下：

instruction.md # 提供给智能体的提示信息 environment/Dockerfile # python:3.11-slim-bookworm 基础镜像 + 任务特定 pip 依赖 tests/test.sh # 验证器入口（写入 /logs/verifier/reward.txt） tests/verifier.py # 验证器实现（内嵌、确定性） tests/verifier_data.json # 每个任务的验证器输入（JSON 格式，无代码插值） metadata.json # 来源信息：source_dataset、row_index、family 等 task.toml # 标准 Harbor 任务配置（CPU/内存/超时默认值）

验证器类型

stdio_diff：运行 /app/solution.py，针对隐藏的 stdin/stdout 测试用例进行对比验证。

使用示例

加载数据集： python from datasets import load_dataset

ds = load_dataset("laion/nemotron-gym-competitive-coding", split="train") print(ds[0]["path"], len(ds[0]["task_binary"]))

运行单个任务（使用 Harbor）： bash python - <<PY import gzip, io, tarfile from datasets import load_dataset ds = load_dataset("laion/nemotron-gym-competitive-coding", split="train") row = ds[0] with tarfile.open(fileobj=io.BytesIO(row["task_binary"]), mode="r:gz") as tar: tar.extractall("/tmp/competitive-coding-task") PY harbor run -t /tmp/competitive-coding-task -e daytona # 或 -e docker

来源

该数据集是 nvidia/Nemotron-RL-coding-competitive_coding 的衍生作品，属于 NVIDIA 的 NeMo-Gym 集合。

搜集汇总

数据集介绍

构建方式

nemotron-gym-competitive-coding数据集源自NVIDIA的NeMo-Gym集合，由OpenThoughts-Agent工具通过适配器转换而来，是nvidia/Nemotron-RL-coding-competitive_coding数据集的Harbor格式版本。构建过程遵循安全优先原则：所有数据内容通过JSON文件在运行时解析，避免嵌入脚本或Dockerfile中；基础镜像固定为python:3.11-slim-bookworm，pip依赖通过严格的白名单正则验证；文本字段清除控制字符并限制长度，tarball路径防范遍历和绝对路径攻击。最终生成的tarball具有确定性（条目排序、mtime归零、uid/gid归零），确保字节可复现。

特点

该数据集专为强化学习场景设计，采用Harbor格式封装，每个样本包含一个确定性短ID和对应的二进制任务文件。任务内部遵循Harbor标准布局，包括指令提示文件、Docker环境配置、验证器脚本及其输入数据。验证家族为stdio_diff类型，通过运行/app/solution.py与隐藏的stdin/stdout测试用例比对来生成可验证奖励。数据集规模介于1万至10万条之间，使用CC-BY-4.0许可，兼具安全性和可复现性，适用于训练和评估具备代码生成能力的智能体。

使用方法

用户可通过HuggingFace的datasets库直接加载数据集，例如使用load_dataset函数获取训练集并访问每条数据的路径和二进制内容。运行单个任务时，需先将二进制数据解压到本地目录，再借助Harbor工具执行。具体步骤为：读取数据行，通过gzip和tarfile库将task_binary字段解压至指定路径，然后使用harbor run命令配合Docker或Daytona执行引擎运行任务。这种方式确保了任务环境的一致性和验证过程的标准化，便于研究人员无缝集成到强化学习工作流中。

背景与挑战

背景概述

强化学习（Reinforcement Learning）在代码生成与逻辑推理任务中的应用日益受到关注，其中通过可验证奖励信号来引导智能体学习成为研究热点。在此背景下，LAION团队于2024年基于NVIDIA的Nemotron-RL-coding-competitive_coding数据集，构建了nemotron-gym-competitive-coding数据集。该数据集源自NVIDIA的NeMo-Gym系列，专注于竞争性编程问题的求解，旨在为强化学习社区提供一个安全、可复现的评测基准。数据集内含超过一万条经过Harbor格式封装的编程任务，每条任务包含指令、测试环境及可验证的奖励计算逻辑，可有效评估智能体在隐式测试用例上的代码生成能力。该数据集的发布填补了公开领域内面向竞争性编程的强化学习标准化数据集的空白，为相关研究提供了高可靠性的评测基础。

当前挑战

该数据集主要面临以下挑战：首先，在领域问题层面，竞争性编程任务要求模型具备从自然语言描述中理解复杂逻辑、设计高效算法并生成精准代码的能力，这对现有强化学习范式的泛化性与样本效率构成严峻考验，尤其是处理隐含的多测试用例验证场景。其次，在数据集构建过程中，为确保安全性与可复现性，团队采取了严格的序列化措施，如避免内容注入到执行代码中、使用确定性的tarball封装及验证管道，这些步骤显著增加了构建复杂度。同时，来源数据中可能存在的噪声文本、控制字符及路径遍历攻击风险，需通过细致的清洗与过滤流程加以消除，从而在数据规模与质量之间取得平衡，这对自动化转换管道的鲁棒性提出了较高要求。

常用场景

经典使用场景

在强化学习与代码生成交叉领域，nemotron-gym-competitive-coding数据集作为标准化评测基准，被广泛用于训练和评估基于可验证奖励信号的编程智能体。其经典使用场景聚焦于竞争性编程任务，智能体需根据问题描述，自动生成能够通过隐藏测试用例的解决方案，并利用Harbor环境提供的确定性验证器获得二进制奖励信号。这种闭环的强化学习范式，使得研究者能够高效迭代策略模型，探索奖励塑形、探索策略等关键课题。

实际应用

在实际应用中，该数据集为自动化代码审查系统、编程比赛辅助工具以及智能编程助手等产品的研发提供了关键支撑。企业级开发者可以基于该数据集训练能够自动生成高难度编程题解法的智能模型，从而提升软件开发效率。同时，该数据集还赋能教育科技领域，用于构建自适应学习系统，通过分析模型在竞争性编程任务上的表现，为学习者提供个性化的编程训练方案。

衍生相关工作

基于该数据集，学术界涌现了多项经典工作，包括基于逆强化学习的可解释性奖励建模方法、结合蒙特卡洛树搜索的探索策略优化框架，以及面向多任务泛化的元强化学习算法。这些衍生工作不仅深化了对代码生成中奖励机制本质的理解，还催生了如CodeRL、RL4F等具有影响力的开源工具库，形成了丰富的学术生态。该数据集的Harbor格式转换技术路线，也为后续数据集的标准化建设提供了可复用的工程范本。

以上内容由遇见数据集搜集并总结生成