nphearum/grpo-4k-reasoning-tools

Name: nphearum/grpo-4k-reasoning-tools
Creator: nphearum
Published: 2026-04-03 05:23:21
License: 暂无描述

Hugging Face2026-04-03 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/nphearum/grpo-4k-reasoning-tools

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - question-answering - feature-extraction - text-classification - summarization language: - en tags: - thinking - reasoning - tools - grpo - function-calling - Opus - gpt - qwen size_categories: - 1K<n<10K --- # 🧠 GRPO 4K Reasoning Tools A compact, high-quality dataset designed to train and evaluate **reasoning-capable LLMs with tool usage and function calling**. This dataset focuses on structured thinking, multi-step reasoning, and practical tool integration across diverse NLP tasks. ## 📌 Overview **GRPO 4K Reasoning Tools** is a curated dataset of ~4K examples that combines: * 🧩 Step-by-step reasoning (“thinking” traces) * 🛠️ Tool usage / function calling patterns * 🧠 Multi-task learning signals It is suitable for training or fine-tuning models to: * Think before answering * Decide when to use tools * Produce structured and reliable outputs ## 🎯 Tasks Covered This dataset spans multiple NLP task categories: * **Question Answering** * **Feature Extraction** * **Text Classification** * **Summarization** Each example is designed to encourage **reasoning-first behavior**, often requiring intermediate steps before producing the final answer. ## ✨ Key Features * **🧠 Reasoning-Centric** Includes explicit reasoning steps to improve chain-of-thought capabilities. * **🛠️ Tool-Augmented** Examples demonstrate when and how to call tools (function calling format). * **🔄 Multi-Model Friendly** Compatible with training setups for models like GPT-style, Qwen, and other instruction-tuned LLMs. * **📏 Compact but Dense (~4K samples)** Carefully curated for quality over quantity. * **⚙️ Structured Outputs** Useful for training structured generation (JSON, tool calls, etc.). ## 📂 Dataset Structure Each sample typically contains: ```json { "category": "math" "system": "User query or task description", "user": "Optional context", "thinking": "Step-by-step reasoning process", "messages": [ { "role": "assistant", "content": "<think>...</think>..." } ], "assistant": "Final answer" } ``` ### Fields Explained | Field | Description | | ------------- | ----------------------------------- | | `instruction` | Task description or user query | | `input` | Additional context (optional) | | `thinking` | Intermediate reasoning steps | | `tool_calls` | Function/tool usage (if applicable) | | `output` | Final response | ## 🧪 Use Cases * Fine-tuning LLMs for: * Tool use (function calling) * Structured reasoning * Multi-step problem solving * Evaluating: * Reasoning quality * Tool selection accuracy * Output consistency * Research in: * GRPO (Generalized Reinforcement Policy Optimization) * Chain-of-thought learning * Tool-augmented agents ## 🚀 Getting Started ### Load Dataset (Hugging Face) ```python from datasets import load_dataset dataset = load_dataset("nphearum/grpo-4k-reasoning-tools") print(dataset["train"][0]) ``` ## 🧠 Training Tips * Use **reasoning traces (`thinking`)** as supervision targets * Optionally: * Mask reasoning during inference * Train with or without tool calls depending on your objective * Combine with: * Instruction tuning datasets * Tool-use benchmarks ## 🏷️ Tags `reasoning` • `thinking` • `tools` • `function-calling` • `grpo` • `multi-task` ## 📊 Dataset Size * **~4K examples** * Category: `1K < n < 10K` ## 📜 License Apache License 2.0 ## 🤝 Contributing Contributions, improvements, and extensions are welcome! Feel free to: * Add more tool-use scenarios * Improve reasoning quality * Expand task diversity --- ## 📬 Contact Maintained by **@nphearum** For questions or collaboration, open an issue or discussion on the repository. If you want, I can also tailor this README for: * Hugging Face dataset card format (with YAML + metrics) * GitHub repo README (with badges, visuals) * Training pipeline examples (GRPO / RLHF / SFT) Just tell me 👍

提供机构：

nphearum

5,000+

优质数据集

54 个

任务类型

进入经典数据集