five

laion/nemotron-terminal-adapters_math

收藏
Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/laion/nemotron-terminal-adapters_math
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - question-answering language: - en tags: - code - terminal - agent - trace - sft configs: - config_name: default data_files: - split: train path: data.parquet --- # nemotron-terminal-adapters_math Per-source partition of [nvidia/Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus), filtered to `source == "adapters_math"`. The `difficulty` column preserves the original `easy` / `medium` / `mixed` split (`na` for the `dataset_adapters/*` files, which did not carry a difficulty label). Partitioning scheme: - **adapters_{code,math,swe}** — rows from `dataset_adapters/{code,math,swe}.parquet` - **{skill}** (e.g. `debugging`, `security`, …) — rows from `synthetic_tasks/skill_based/{easy,medium,mixed}/{skill}/data_filtered.parquet` ## Columns Same as the source dataset (`conversations`, `agent`, `model`, `model_provider`, `date`, `task`, `episode`, `run_id`, `trial_name`, `enable_thinking`) plus: - `source` — the partition key (`"adapters_math"` throughout this repo) - `difficulty` — `easy` / `medium` / `mixed` / `na` - `original_source` — only present in `adapters_code`; preserves the original `source` column value (`OpenCodeReasoning` or `synthetic`) from the upstream file. ## Citation ```bibtex @misc{pi2026dataengineeringscalingllm, title={On Data Engineering for Scaling LLM Terminal Capabilities}, author={Renjie Pi and Grace Lam and Mohammad Shoeybi and Pooya Jannaty and Bryan Catanzaro and Wei Ping}, year={2026}, eprint={2602.21193}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.21193}, } ``` Original dataset license: CC-BY-4.0.

--- license: CC-BY-4.0 task_categories: - 问答(question-answering) language: - 英语(en) tags: - 代码(code) - 终端(terminal) - AI智能体(Agent) - 追踪(trace) - 监督微调(SFT) configs: - config_name: default data_files: - split: 训练集(train) path: data.parquet --- # nemotron-terminal-adapters_math 本数据集为[nvidia/Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus)按源分区后的子集,仅筛选保留`source == "adapters_math"`的样本。其中`difficulty`字段保留了原始数据集的`easy`(简单)、`medium`(中等)、`mixed`(混合)划分标准;对于`dataset_adapters/*`系列文件,由于其未附带难度标签,故该字段值为`na`。 分区方案: - **adapters_{code,math,swe}** — 取自`dataset_adapters/{code,math,swe}.parquet`的样本行 - **{skill}**(例如`debugging`(调试)、`security`(安全)等) — 取自`synthetic_tasks/skill_based/{easy,medium,mixed}/{skill}/data_filtered.parquet`的样本行 ## 字段说明 字段与原始数据集一致,包含`conversations`、`agent`、`model`、`model_provider`、`date`、`task`、`episode`、`run_id`、`trial_name`、`enable_thinking`,额外新增字段如下: - `source` — 分区键(本仓库中所有样本的该字段值均为`"adapters_math"`) - `difficulty` — 难度标签,可选值为`easy` / `medium` / `mixed` / `na` - `original_source` — 仅在`adapters_code`分区中存在,保留上游文件中原始的`source`字段值(`OpenCodeReasoning`或`synthetic`) ## 引用格式 bibtex @misc{pi2026dataengineeringscalingllm, title={面向扩展大语言模型(LLM)终端能力的数据工程研究}, author={Renjie Pi and Grace Lam and Mohammad Shoeybi and Pooya Jannaty and Bryan Catanzaro and Wei Ping}, year={2026}, eprint={2602.21193}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.21193}, } 原始数据集许可证:CC-BY-4.0.
提供机构:
laion
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作