laion/nemotron-terminal-adapters_math

Name: laion/nemotron-terminal-adapters_math
Creator: laion
Published: 2026-04-13 11:12:19
License: 暂无描述

Hugging Face2026-04-13 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/laion/nemotron-terminal-adapters_math

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - question-answering language: - en tags: - code - terminal - agent - trace - sft configs: - config_name: default data_files: - split: train path: data.parquet --- # nemotron-terminal-adapters_math Per-source partition of [nvidia/Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus), filtered to `source == "adapters_math"`. The `difficulty` column preserves the original `easy` / `medium` / `mixed` split (`na` for the `dataset_adapters/*` files, which did not carry a difficulty label). Partitioning scheme: - **adapters_{code,math,swe}** — rows from `dataset_adapters/{code,math,swe}.parquet` - **{skill}** (e.g. `debugging`, `security`, …) — rows from `synthetic_tasks/skill_based/{easy,medium,mixed}/{skill}/data_filtered.parquet` ## Columns Same as the source dataset (`conversations`, `agent`, `model`, `model_provider`, `date`, `task`, `episode`, `run_id`, `trial_name`, `enable_thinking`) plus: - `source` — the partition key (`"adapters_math"` throughout this repo) - `difficulty` — `easy` / `medium` / `mixed` / `na` - `original_source` — only present in `adapters_code`; preserves the original `source` column value (`OpenCodeReasoning` or `synthetic`) from the upstream file. ## Citation ```bibtex @misc{pi2026dataengineeringscalingllm, title={On Data Engineering for Scaling LLM Terminal Capabilities}, author={Renjie Pi and Grace Lam and Mohammad Shoeybi and Pooya Jannaty and Bryan Catanzaro and Wei Ping}, year={2026}, eprint={2602.21193}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.21193}, } ``` Original dataset license: CC-BY-4.0.

--- license: CC-BY-4.0 task_categories: - 问答（question-answering） language: - 英语（en） tags: - 代码（code） - 终端（terminal） - AI智能体（Agent） - 追踪（trace） - 监督微调（SFT） configs: - config_name: default data_files: - split: 训练集（train） path: data.parquet --- # nemotron-terminal-adapters_math 本数据集为[nvidia/Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus)按源分区后的子集，仅筛选保留`source == "adapters_math"`的样本。其中`difficulty`字段保留了原始数据集的`easy`（简单）、`medium`（中等）、`mixed`（混合）划分标准；对于`dataset_adapters/*`系列文件，由于其未附带难度标签，故该字段值为`na`。分区方案： - **adapters_{code,math,swe}** — 取自`dataset_adapters/{code,math,swe}.parquet`的样本行 - **{skill}**（例如`debugging`（调试）、`security`（安全）等） — 取自`synthetic_tasks/skill_based/{easy,medium,mixed}/{skill}/data_filtered.parquet`的样本行 ## 字段说明字段与原始数据集一致，包含`conversations`、`agent`、`model`、`model_provider`、`date`、`task`、`episode`、`run_id`、`trial_name`、`enable_thinking`，额外新增字段如下： - `source` — 分区键（本仓库中所有样本的该字段值均为`"adapters_math"`） - `difficulty` — 难度标签，可选值为`easy` / `medium` / `mixed` / `na` - `original_source` — 仅在`adapters_code`分区中存在，保留上游文件中原始的`source`字段值（`OpenCodeReasoning`或`synthetic`） ## 引用格式 bibtex @misc{pi2026dataengineeringscalingllm, title={面向扩展大语言模型（LLM）终端能力的数据工程研究}, author={Renjie Pi and Grace Lam and Mohammad Shoeybi and Pooya Jannaty and Bryan Catanzaro and Wei Ping}, year={2026}, eprint={2602.21193}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.21193}, } 原始数据集许可证：CC-BY-4.0.

提供机构：

laion

5,000+

优质数据集

54 个

任务类型

进入经典数据集