five

KlearReasoner-CodeSub-15K

收藏
魔搭社区2025-12-05 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/Kwai-Klear/KlearReasoner-CodeSub-15K
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Summary This dataset is a high-quality subset of the Klear-Reasoner Code RL dataset, derived from the RL data used in the [rllm project](https://github.com/agentica-project/rllm). Part of this data contributed to training Klear-Reasoner’s code reasoning models. The dataset is carefully cleaned and filtered to include only reliable samples suitable for reinforcement learning. Models trained with this dataset have shown substantial performance improvements across various code reasoning benchmarks. You can load the dataset via the Hugging Face datasets library: ```python from datasets import load_dataset dataset = load_dataset("Kwai-Klear/KlearReasoner-CodeSub-15K") ``` | Resource | Link | |---|---| | 📝 Preprints | [Paper](https://arxiv.org/pdf/2508.07629) | | 🤗 Daily Paper | [Paper](https://huggingface.co/papers/2508.07629) | | 🤗 Model Hub | [Klear-Reasoner-8B](https://huggingface.co/Kwai-Klear/Klear-Reasoner-8B) | | 🤗 Dataset Hub | [Math RL](https://huggingface.co/datasets/Kwai-Klear/KlearReasoner-MathSub-30K) | | 🤗 Dataset Hub | [Code RL](https://huggingface.co/datasets/Kwai-Klear/KlearReasoner-CodeSub-15K) | | 🐛 Issues & Discussions | [GitHub Issues](https://github.com/suu990901/KlearReasoner/issues) | | 📧 Contact | suzhenpeng13@163.com | ## Data Fields - **data_source** (string) — The source identifier for the sample. - **prompt** (list of dict) — The input prompt, stored as a list of message objects in chat format. - **ability** (string) — The skill or task category associated with the sample. - **reward_model** (dict) — Information about the ground truth or reward signal. - **ground_truth** (string) — The expected correct answer (may include LaTeX formatting). - **style** (string) — The method or type of evaluation, e.g., "rule". - **index_level_0** (int) — An internal index or unique identifier for the sample. ## Demonstration of Data Quality This dataset contains exclusively high-quality, filtered samples. All samples have been selected to ensure accurate reward signals for reinforcement learning, following the gradient-preserving clipping policy optimization (GPPO) method introduced in our paper. Models trained using this dataset achieve strong generalization and reliable performance on a range of math reasoning tasks. ## Citation If you find this work helpful, please cite our paper: ```bibtex @misc{su2025cegppocontrollingentropygradientpreserving, title={CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning}, author={Zhenpeng Su and Leiyu Pan and Minxuan Lv and Yuntao Li and Wenping Hu and Fuzheng Zhang and Kun Gai and Guorui Zhou}, year={2025}, eprint={2509.20712}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2509.20712}, } ``` ```bibtex @article{DBLP:journals/corr/abs-2508-07629, author = {Zhenpeng Su and Leiyu Pan and Xue Bai and Dening Liu and Guanting Dong and Jiaming Huang and Wenping Hu and Fuzheng Zhang and Kun Gai and Guorui Zhou}, title = {Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization}, journal = {CoRR}, volume = {abs/2508.07629}, year = {2025}, url = {https://doi.org/10.48550/arXiv.2508.07629}, doi = {10.48550/ARXIV.2508.07629}, eprinttype = {arXiv}, eprint = {2508.07629}, timestamp = {Sat, 13 Sep 2025 14:46:27 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2508-07629.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ```

# 数据集概述 本数据集是Klear-Reasoner代码强化学习(Reinforcement Learning, RL)数据集的高质量子集,源自[rllm项目](https://github.com/agentica-project/rllm)所使用的强化学习数据。其中部分数据用于训练Klear-Reasoner的代码推理模型。 本数据集经过精心清洗与筛选,仅保留适用于强化学习的可靠样本。基于该数据集训练的模型在各类代码推理基准测试中均展现出显著的性能提升。 您可通过Hugging Face数据集库加载本数据集: python from datasets import load_dataset dataset = load_dataset("Kwai-Klear/KlearReasoner-CodeSub-15K") | 资源类型 | 链接 | |---|---| | 📝 预印本 | [论文](https://arxiv.org/pdf/2508.07629) | | 🤗 每日论文 | [论文](https://huggingface.co/papers/2508.07629) | | 🤗 模型仓库 | [Klear-Reasoner-8B](https://huggingface.co/Kwai-Klear/Klear-Reasoner-8B) | | 🤗 数据集仓库 | [数学强化学习(Math RL)](https://huggingface.co/datasets/Kwai-Klear/KlearReasoner-MathSub-30K) | | 🤗 数据集仓库 | [代码强化学习(Code RL)](https://huggingface.co/datasets/Kwai-Klear/KlearReasoner-CodeSub-15K) | | 🐛 问题与讨论 | [GitHub Issues](https://github.com/suu990901/KlearReasoner/issues) | | 📧 联系方式 | suzhenpeng13@163.com | # 数据字段 - **data_source**(字符串类型)—— 样本的来源标识符。 - **prompt**(字典列表类型)—— 输入提示,以聊天格式的消息对象列表形式存储。 - **ability**(字符串类型)—— 该样本对应的技能或任务类别。 - **reward_model**(字典类型)—— 关于基准真值或奖励信号的信息。 - **ground_truth**(字符串类型)—— 预期的正确答案(可能包含LaTeX格式)。 - **style**(字符串类型)—— 评估方法或类型,例如"rule"(规则式)。 - **index_level_0**(整数类型)—— 样本的内部索引或唯一标识符。 # 数据质量示例 本数据集仅包含经过筛选的高质量样本。所有样本均经过严格挑选,以确保强化学习所需的奖励信号准确无误,且遵循了论文中提出的梯度保留裁剪策略优化(GPPO)方法。基于该数据集训练的模型在各类数学推理任务中均具备出色的泛化能力与可靠性能。 # 引用方式 如果您认为本工作对您有所帮助,请引用我们的论文: bibtex @misc{su2025cegppocontrollingentropygradientpreserving, title={CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning}, author={Zhenpeng Su and Leiyu Pan and Minxuan Lv and Yuntao Li and Wenping Hu and Fuzheng Zhang and Kun Gai and Guorui Zhou}, year={2025}, eprint={2509.20712}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2509.20712}, } bibtex @article{DBLP:journals/corr/abs-2508.07629, author = {Zhenpeng Su and Leiyu Pan and Xue Bai and Dening Liu and Guanting Dong and Jiaming Huang and Wenping Hu and Fuzheng Zhang and Kun Gai and Guorui Zhou}, title = {Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization}, journal = {CoRR}, volume = {abs/2508.07629}, year = {2025}, url = {https://doi.org/10.48550/arXiv.2508.07629}, doi = {10.48550/ARXIV.2508.07629}, eprinttype = {arXiv}, eprint = {2508.07629}, timestamp = {Sat, 13 Sep 2025 14:46:27 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2508.07629.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
提供机构:
maas
创建时间:
2025-09-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作