TMLR-Group-HF/Co-rewarding-RephrasedOpenRS

Name: TMLR-Group-HF/Co-rewarding-RephrasedOpenRS
Creator: TMLR-Group-HF
Published: 2025-10-11 06:47:48
License: 暂无描述

Hugging Face2025-10-11 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/TMLR-Group-HF/Co-rewarding-RephrasedOpenRS

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation language: - en tags: - mathematical-reasoning - reinforcement-learning - llm - self-supervised - question-rewriting --- # Co-rewarding: Rephrased OpenRS Training Set This dataset is the OpenRS training set used in the **Co-rewarding-I** method, as presented in the paper [Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models](https://huggingface.co/papers/2508.00410). **Paper:** [https://huggingface.co/papers/2508.00410](https://huggingface.co/papers/2508.00410) **Code:** [https://github.com/tmlr-group/Co-rewarding](https://github.com/tmlr-group/Co-rewarding) This dataset is generated by rephrasing original math problems from the OpenRS dataset using the Qwen3-32B model with the following prompt: ``` You are given a math problem. Please rewrite it using different wording and a different real-world scenario, while keeping the underlying mathematical meaning and answer exactly the same. Guidelines: 1. Do not change the math logic or the final answer. 2. Use different words and a new context to make it look like a different problem. 3. Avoid copying phrases or sentence structures from the original. 4. Make sure the rewritten question is natural, clear, and solvable. 5. Output ONLY between the following markers, and strictly in this format (no extra explanation): ### RESULT_START ORIGINAL: <original question> REWRITE: <rewritten question> ### RESULT_END ``` This dataset contains the original math problems from the OpenRS dataset and their rephrased versions that maintain the same solution as the original one. ## Sample Usage (Data Generation) This dataset was generated by rephrasing questions using the `rewrite_questions.py` script from the [Co-rewarding GitHub repository](https://github.com/tmlr-group/Co-rewarding). You can use a similar command to generate rephrased data for other datasets: ```bash # Example: Rephrase the OpenRS training data python rewrite_questions.py \ --input_path data/open-rs/train.parquet \ --output_jsonl data/open-rs/train_rewrite_Qwen3-32B.jsonl \ --output_parquet data/open-rs/train_rewrite_Qwen3-32B.parquet \ --output_original_parquet data/open-rs/train_original.parquet \ --model_path $YOUR_Qwen3-32B_MODEL_PATH \ --tokenizer_path $YOUR_Qwen3-32B_TOKENIZER_PATH \ --question_column prompt \ --batch_size 128 ``` This command uses a specified language model to rephrase questions from an input parquet file, saving the rephrased data to new JSONL and parquet files. ## Citation If you use this dataset, please cite our paper! ```bibtex @article{zhang2025coreward, title={Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models}, author={Zizhuo Zhang and Jianing Zhu and Xinmu Ge and Zihua Zhao and Zhanke Zhou and Xuan Li and Xiao Feng and Yao, Jiangchao and Han, Bo}, journal={arXiv preprint arXiv:2508.00410}, year={2025}, url={https://huggingface.co/papers/2508.00410}, } ```

提供机构：

TMLR-Group-HF

5,000+

优质数据集

54 个

任务类型

进入经典数据集