five

AmanPriyanshu/regularizer-250K-from-reasoning-and-tool-use-sft-4M-random-compilation

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/regularizer-250K-from-reasoning-and-tool-use-sft-4M-random-compilation
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 --- # Regularizer 250K (from Reasoning + Tool-Use SFT 4M) 250,000 samples combining tool-use agentic data and general reasoning, designed as a regularization set during domain-specific fine-tuning. Preserves tool-use, coding, research, and general reasoning capabilities. ## Construction 1. **Tool-reasoning subset (150K)**: Sampled 50K per category (TOOLS, CODING, RESEARCH) from [AmanPriyanshu/tool-reasoning-sft-1M-random-compilation](https://huggingface.co/datasets/AmanPriyanshu/tool-reasoning-sft-1M-random-compilation) (1M total rows). Random sampling with seed=42. 2. **Reasoning subset (100K)**: Sampled 100K single-turn examples from [AmanPriyanshu/regularizer-250K-from-reasoning-sft-3M-random-compilation](https://huggingface.co/datasets/AmanPriyanshu/regularizer-250K-from-reasoning-sft-3M-random-compilation) (250K total rows). - Filtered for single-turn only: `len(input) <= 2` (system + user, no prior assistant turns) - 227,813 single-turn available out of 250K; sampled 100K - Response parsed: `<think>...</think>` → `reasoning` role, remainder → `answer` role 3. All 250K rows shuffled (seed=42). Messages stored as `json.dumps` strings (use `json.loads` to parse). ## Schema | Column | Type | Description | |--------|------|-------------| | `messages` | `str` (JSON) | JSON string of conversation messages. Use `json.loads()` to parse into `list[{role, content}]` | | `source_dataset` | `str` | Original source dataset identifier | | `source_category` | `str` | One of: `TOOLS`, `CODING`, `RESEARCH`, `REASONING` | ## Category Distribution | Category | Count | % | Source | |----------|-------|---|--------| | TOOLS | 50,000 | 20% | tool-reasoning-sft-1M | | CODING | 50,000 | 20% | tool-reasoning-sft-1M | | RESEARCH | 50,000 | 20% | tool-reasoning-sft-1M | | REASONING | 100,000 | 40% | regularizer-250K | | **Total** | **250,000** | **100%** | | ## Source Dataset Distribution (38 unique) ### TOOLS (50K from 7 sources) | Source Dataset | Sampled | |----------------|---------| | toucan-1.5m-sft-tool-use-data-cleaned-rectified-333k | 9,927 | | hermes-reasoning-tool-style-data-cleaned-rectified-115k | 9,874 | | ToolMind-data-cleaned-rectified | 9,747 | | hermes_reasoning_tool_use-data-cleaned-rectified | 6,599 | | toolace-sft-tool-use-agent-data-cleaned-rectified | 2,019 | | mobile-actions-data-cleaned-rectified | 1,131 | | toolmind-web-qa-sft-tool-use-data-cleaned-rectified-5.2k | 703 | ### CODING (50K from 9 sources) | Source Dataset | Sampled | |----------------|---------| | nvidia-Nemotron-Agentic-v1 | 6,893 | | Nemotron-Terminal-Corpus-data-cleaned-rectified | 6,881 | | allenai-SERA-data-cleaned-rectified | 6,858 | | text_to_terminal_v2-sft-tool-use-agent-data-cleaned-rectified | 6,828 | | browsing-sft-tool-use-data-cleaned-rectified | 6,807 | | CoderForge-Preview-data-cleaned-rectified | 6,797 | | jupyter-agent-dataset-sft-tool-use-agent-data-cleaned-rectified | 6,762 | | CoVe-12k-data-cleaned-rectified | 1,624 | | MEnvData-SWE-Trajectory-data-cleaned-rectified | 550 | ### RESEARCH (50K from 8 sources) | Source Dataset | Sampled | |----------------|---------| | OpenHands-CodeScout_Training_Rollouts | 8,960 | | grill-lab-browsecomp-plus-runs-data-cleaned-rectified | 8,896 | | explorations | 8,802 | | rlvr-env-retrieval-source | 8,791 | | openresearcher-dataset-sft-deep-research-agent-data-cleaned | 8,787 | | dr-tulu-sft-deep-research-agent-data-cleaned-rectified | 2,392 | | REDSearcher_SFT_10K | 1,808 | | OpenSeeker-v1-Data | 1,564 | ### REASONING (100K from 14+ sources) Sampled from single-turn entries of the [regularizer-250K](https://huggingface.co/datasets/AmanPriyanshu/regularizer-250K-from-reasoning-sft-3M-random-compilation) dataset. Sources include reasoning-sft-CHIMERA, reasoning-sft-OpenThoughts3, reasoning-sft-dolci-think-sft, reasoning-sft-Nemotron-Cascade-SFT-SWE, and others. See parent dataset for full source breakdown. ## Message Format All rows use a unified `messages` format: **All rows** use a unified message format with roles: `system`, `user`, `reasoning`, `tool_call`, `tool_output`, `answer`. **Tool-use rows (TOOLS/CODING/RESEARCH)** — multi-turn agentic: ```json [ {"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "reasoning", "content": "<think>...</think>"}, {"role": "tool_call", "content": "..."}, {"role": "tool_output", "content": "..."}, ... {"role": "answer", "content": "..."} ] ``` **Reasoning rows (REASONING)** — single-turn with think tags: ```json [ {"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "reasoning", "content": "<think>...</think>"}, {"role": "answer", "content": "..."} ] ``` ## Usage ```python import json from datasets import load_dataset ds = load_dataset("AmanPriyanshu/regularizer-250K-from-reasoning-and-tool-use-sft-4M-random-compilation", split="train") messages = json.loads(ds[0]["messages"]) # list of {role, content} dicts ``` ## Purpose This dataset serves as a **regularizer** during domain-specific fine-tuning (e.g., cybersecurity). By mixing in diverse tool-use and reasoning examples, it prevents catastrophic forgetting of general capabilities while the model specializes. ## Parent Datasets - [AmanPriyanshu/tool-reasoning-sft-1M-random-compilation](https://huggingface.co/datasets/AmanPriyanshu/tool-reasoning-sft-1M-random-compilation) — 1M multi-turn tool-use SFT samples across TOOLS, CODING, RESEARCH - [AmanPriyanshu/regularizer-250K-from-reasoning-sft-3M-random-compilation](https://huggingface.co/datasets/AmanPriyanshu/regularizer-250K-from-reasoning-sft-3M-random-compilation) — 250K reasoning samples from the 3M reasoning SFT compilation
提供机构:
AmanPriyanshu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作