five

CriticLeanInstruct

收藏
魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/m-a-p/CriticLeanInstruct
下载链接
链接失效反馈
官方服务:
资源简介:
# CriticLeanInstruct Welcome to the **CriticLeanInstruct** dataset repository! This dataset is designed to facilitate the alignment of large language models (LLMs) through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) processes. Below you'll find detailed information about the dataset structure, usage, and associated model variants. ## 🔍 Overview The CriticLeanInstruct dataset suite consists of several JSONL files, each serving specific purposes in the model training pipeline. Here's a breakdown of each file: | File Name | Description | |-----------|-------------| | `CriticLean_12K` | All of our critic lean data. | | `CriticLean_4K` | Seed Data. A subset of the CriticLean_12K, specifically used as our Reinforcement Learning (RL) dataset. | | `CriticLean_Mix_48K` | An expanded mixed dataset including:<br>- `CriticLean_12K`<br>- 18k math data sampled from [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)<br>- 18k code data sampled from [OpenThoughts-114k-Code_decontaminated](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated) | | `CriticLean_Mix_16K` | A mixed dataset including:<br>- `CriticLean_4K`<br>- 6k math data sampled from [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)<br>- 6k code data sampled from [OpenThoughts-114k-Code_decontaminated](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated) | ## 🍭 Associated Model Variants We've trained several model variants using the CriticLean dataset to demonstrate its effectiveness. Below is a summary of the training configurations: | Base Model | SFT Applied? | SFT Data | RL Applied? | RL Data | CriticLeanGPT Model Name | |------------|--------------|----------|-------------|---------|------------| | Qwen2.5-Instruct | Yes | `CriticLean_4K` | No | * | Qwen2.5-Instruct-SFT(Critic Only) | | Qwen2.5-Instruct | Yes | `CriticLean_Mix_16K` | No | * | Qwen2.5-Instruct-SFT(16K) | | Qwen2.5-Instruct | Yes | `CriticLean_Mix_48K` | No | * | Qwen2.5-Instruct-SFT | | Qwen2.5-Instruct | Yes | `CriticLean_Mix_48K` | Yes | `CriticLean_4K` | Qwen2.5-Instruct-SFT-RL | | Qwen2.5-Instruct | No | * | Yes | `CriticLean_4K` | Qwen2.5-Instruct-RL | | Qwen3 | No | * | Yes | `CriticLean_4K` | Qwen3-RL | ## ⛏️ Usage ### For Supervised Fine-Tuning (SFT) The `CriticLean_Mix_*` files are optimized for SFT, with varying sizes and domain focuses: - Use `CriticLean_Mix_16K` for a balanced mix of critic data, math, and code (16k total samples) - Use `CriticLean_Mix_48K` for a larger dataset with expanded math and code content (48k total samples) - Use `CriticLean_4K` or `CriticLean_12K` for pure critic lean data without additional math/code ### For Reinforcement Learning (RL) `CriticLean_4K` is specifically designated as the RL dataset, used to fine-tune models after SFT (or directly on base models) through reinforcement learning from human feedback (RLHF) or similar techniques. ## 🔗 Referenced Datasets & Links This dataset incorporates samples from: - [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k) - [OpenThoughts-114k-Code_decontaminated](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated) We thank the creators of these datasets for making their work available to the community. ## 📜 License Apache-2.0 license ## ☕️ Citation If you use CriticLeanBench in your research, please cite our paper: ``` @misc{peng2025criticleancriticguidedreinforcementlearning, title={CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization}, author={Zhongyuan Peng and Yifan Yao and Kaijing Ma and Shuyue Guo and Yizhe Li and Yichi Zhang and Chenchen Zhang and Yifan Zhang and Zhouliang Yu and Luming Li and Minghao Liu and Yihang Xia and Jiawei Shen and Yuchen Wu and Yixin Cao and Zhaoxiang Zhang and Wenhao Huang and Jiaheng Liu and Ge Zhang}, year={2025}, eprint={2507.06181}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2507.06181}, } ```

# CriticLeanInstruct 欢迎来到**CriticLeanInstruct**数据集仓库!本数据集旨在通过监督微调(Supervised Fine-Tuning, SFT)与强化学习(Reinforcement Learning, RL)流程,助力大语言模型(Large Language Model, LLM)的对齐任务。下文将详细介绍该数据集的结构、使用方法及相关模型变体。 ## 🔍 数据集概览 CriticLeanInstruct数据集套件包含多个JSONL格式文件,各自在模型训练流程中承担特定职责。以下为各文件的详细说明: | 文件名 | 描述 | |-----------|-------------| | `CriticLean_12K` | 完整的批评者学习数据集。 | | `CriticLean_4K` | 种子数据集。为`CriticLean_12K`的子集,专门用作强化学习(RL)数据集。 | | `CriticLean_Mix_48K` | 扩展混合数据集,包含:<br>- `CriticLean_12K`<br>- 从[OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)中采样的18k条数学数据<br>- 从[OpenThoughts-114k-Code_decontaminated](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated)中采样的18k条代码数据 | | `CriticLean_Mix_16K` | 混合数据集,包含:<br>- `CriticLean_4K`<br>- 从[OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)中采样的6k条数学数据<br>- 从[OpenThoughts-114k-Code_decontaminated](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated)中采样的6k条代码数据 | ## 🍭 相关模型变体 我们基于CriticLean数据集训练了多个模型变体,以验证其有效性。以下为训练配置摘要: | 基础模型 | 是否应用SFT | SFT数据集 | 是否应用RL | RL数据集 | 模型名称 | |------------|--------------|----------|-------------|---------|------------| | Qwen2.5-Instruct | 是 | `CriticLean_4K` | 否 | * | Qwen2.5-Instruct-SFT(Critic Only) | | Qwen2.5-Instruct | 是 | `CriticLean_Mix_16K` | 否 | * | Qwen2.5-Instruct-SFT(16K) | | Qwen2.5-Instruct | 是 | `CriticLean_Mix_48K` | 否 | * | Qwen2.5-Instruct-SFT | | Qwen2.5-Instruct | 是 | `CriticLean_Mix_48K` | 是 | `CriticLean_4K` | Qwen2.5-Instruct-SFT-RL | | Qwen2.5-Instruct | 否 | * | 是 | `CriticLean_4K` | Qwen2.5-Instruct-RL | | Qwen3 | 否 | * | 是 | `CriticLean_4K` | Qwen3-RL | ## ⛏️ 使用方法 ### 监督微调(SFT) `CriticLean_Mix_*`系列文件专为监督微调(SFT)优化,具备不同的规模与领域侧重: - 使用`CriticLean_Mix_16K`以获取批评者数据、数学与代码的平衡混合集(总计16k条样本) - 使用`CriticLean_Mix_48K`以获取规模更大、扩展了数学与代码内容的数据集(总计48k条样本) - 使用`CriticLean_4K`或`CriticLean_12K`以获取纯批评者学习数据,无需额外的数学/代码数据 ### 强化学习(RL) `CriticLean_4K`被专门指定为强化学习数据集,可用于在监督微调后(或直接在基础模型上)通过人类反馈强化学习(Reinforcement Learning from Human Feedback, RLHF)等技术对模型进行微调。 ## 🔗 引用数据集与链接 本数据集纳入了以下来源的样本: - [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k) - [OpenThoughts-114k-Code_decontaminated](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated) 我们感谢这些数据集的创作者将其成果开放给社区。 ## 📜 许可证 Apache-2.0许可证 ## ☕️ 引用方式 若您在研究中使用CriticLeanBench,请引用以下论文: @misc{peng2025criticleancriticguidedreinforcementlearning, title={CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization}, author={Zhongyuan Peng and Yifan Yao and Kaijing Ma and Shuyue Guo and Yizhe Li and Yichi Zhang and Chenchen Zhang and Yifan Zhang and Zhouliang Yu and Luming Li and Minghao Liu and Yihang Xia and Jiawei Shen and Yuchen Wu and Yixin Cao and Zhaoxiang Zhang and Wenhao Huang and Jiaheng Liu and Ge Zhang}, year={2025}, eprint={2507.06181}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2507.06181}, }
提供机构:
maas
创建时间:
2025-08-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作