five

Spark-Data

收藏
魔搭社区2025-12-04 更新2025-10-11 收录
下载链接:
https://modelscope.cn/datasets/Shanghai_AI_Laboratory/Spark-Data
下载链接
链接失效反馈
官方服务:
资源简介:
<p align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/63859cf3b2906edaf83af9f0/FGS454laRCGTIAzgrbGdG.png" alt="logo" width="200"> </p> # Spark-Data [Paper](https://huggingface.co/papers/2509.22624) | [Github Repository](https://github.com/InternLM/Spark) | [Models](https://huggingface.co/internlm/Spark-VL-7B) ## Data Introduction This repository stores the datasets used for training 🤗[Spark-VL-7B](https://huggingface.co/internlm/Spark-VL-7B) and Spark-VL-32B, as well as a collection of multiple mathematical benchmarks covered in the [SPARK: Synergistic Policy And Reward Co-Evolving Framework](https://huggingface.co/papers/2509.22624) paper. `infer_data_ViRL_19k_h.json` is used for training Spark-VL-7B. `infer_data_ViRL_hard_24k_h.json` is used for training Spark-VL-32B. `benchmark_combine.json` and `benchmark_combine_v2.json` is a combination of multiple mathematical benchmarks. The training dataset is derived from 🤗[ViRL-39k](https://huggingface.co/datasets/TIGER-Lab/ViRL39K), and we modified its format to fit our training framework. ⭐ If you find our code or model helpful, please consider giving us a star — your support means a lot! ## 📢 News - 🚀 [09/29/2025] We release our **Spark's** 📖[Paper](https://arxiv.org/abs/2509.22624). - 🚀 [09/29/2025] We upload our evaluation code and 🤗[models](https://huggingface.co/internlm/Spark-VL-7B). - 🚀 [09/29/2025] We release **Spark** 🏠[Github repository](https://github.com/InternLM/Spark). ## 💡 Highlights - 🔥 **Synergistic Policy–Reward Co-Evolving (SPARK)**: We introduce SPARK, a unified reinforcement fine-tuning framework that jointly optimizes policy and reward within a single model through on-policy co-evolution. - 🔥 **Recycling Rollouts**: Unlike conventional RL pipelines that discard rollouts after policy updates, SPARK recycles RLVR rollouts into pointwise, pairwise, and reflection objectives, enabling the model itself to act as both a strong policy and a generative reward model. - 🔥 **Co-Evolving Mechanism**: Improved reward accuracy provides better gradients for policy learning, while stronger reasoning further refines reward judgment, forming a positive feedback loop that enhances reasoning, judgment, and reflection in synergy. - 🔥 **Efficient and Practical**: SPARK requires no human preference data, teacher models, or external reward models, making it significantly more data- and compute-efficient than traditional RM-based RL pipelines. ## ⚙️ Framework **SPARK** introduces a unified reinforcement learning framework where policy and reward evolve within a single model. Traditional RL pipelines either rely on external reward models (**RLHF**) or discard verifiable rewards (**RLVR**). In contrast, SPARK recycles verifiable rewards to guide on-policy reward and reflection data generation: This design turns the model into **both a strong policy and a generative reward model**. Through on-policy co-evolving, SPARK establishes a positive feedback loop: **improved reward accuracy provides stronger policy gradients, while better reasoning further enhances reward judgment**. As a result, SPARK not only boosts reasoning and judgment simultaneously but also unlocks self-reflection ability at test time, enabling more stable and generalizable performance across diverse tasks. <a href=""> <img src="https://github.com/InternLM/Spark/blob/main/assets/framework.png" alt="Framework" > </a> ## Sample Usage This dataset is used for training and evaluating SPARK models. Below are examples of how to perform inference with the trained models and how to set up training. ### 🛠️ Setup ```bash git clone https://github.com/InternLM/Spark.git conda create -n Lmm_xc python=3.10 conda activate Visual-RFT cd /Spark/Lmm_XC pip install -e .[vllm] pip install flash_attn --no-build-isolation ``` Lmm_XC is developed upon modifications to the LMM-R1 project, and its installation process can be referred to the LMM-R1 instructions. ### Inference We have uploaded the model **Spark-VL-7B** ([🤗Huggingface](https://huggingface.co/internlm/Spark-VL-7B)). You can use it to evaluate the inference performance on Multimodal Mathematical Benchmarks and Reward-Related Benchmarks. It should be noted that during our training process, we append the following prompt at the end of the input to facilitate answer extraction. Therefore, it is recommended to also append this prompt at the end during testing. ``` Please first conduct reasoning, and then answer the question. Repeat the final answer using a '\\boxed{}'. ``` #### 🤗 Using Transformers Our model is based on Qwen2.5-VL-7B-Instruct. You can use the same code as the Qwen2.5-VL-7B-Instruct model for inference, referring to [🤗Huggingface](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). ```python from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor from qwen_vl_utils import process_vision_info model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "internlm/Spark-VL-7B", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map="auto", ) processor = AutoProcessor.from_pretrained("internlm/Spark-VL-7B") messages = [ { "role": "user", "content": [ { "type": "image", "image": image_path, }, {"type": "text", "text": prompt}, ], } ] # Preparation for inference text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda") # Inference: Generation of the output generated_ids = model.generate(**inputs, max_new_tokens=128) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) ``` #### 🔦 Using vLLM We recommend using **vLLM** for faster inference speed. Using vLLM leads to significant speed improvements in dataset evaluation. ```bash PORT=8019 N_PROC=256 SERVE_NAME=spark_vl_7b MODEL_PATH=/internlm/Spark-VL-7B CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "$MODEL_PATH" \ --tensor-parallel-size 4 \ --served-model-name $SERVE_NAME \ --port $PORT \ --max-num-seqs $N_PROC ``` ### Training #### Spark Training After downloading the dataset, you can start training using the following example bash script. Our bash scripts are in ```/Spark/Lmm_XC/XC/scripts/spark_training``` You need to modify the dataset paths and model paths to your own locations. ```bash export WORKSPACE_DIR="/fs-computility/....../Lmm_XC" # Path to project root directory export DATASET_PATH="/fs-computility/....../infer_data_ViRL_19k.json" # Path to your dataset export PRETRAIN_MODEL_PATH="/fs-computility/....../Qwen2.5-VL-7B-Instruct" # Path to pretrained model export WANDB_PROJECT="Observation" # Name for this project export MODEL_CPK_NAME="Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2" # Name for this training run export LOG_PATH='/fs-computility/....../Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2.txt' #Log file save path export WANDB_API_KEY="......" export SAVE_PATH="/fs-computility/....../${WANDB_PROJECT}/${MODEL_CPK_NAME}" # Absolute path to save everything about this training run export CKPT_PATH="${SAVE_PATH}/ckpt" # Path to save checkpoints export FINAL_CKPT_PATH="${SAVE_PATH}/final_ckpt" # Path to save final checkpoints export TIMESTAMP=$(date +%Y%m%d_%H%M%S) # Timestamp export CUR_LOG_DIR="${SAVE_PATH}/training_logs/${TIMESTAMP}" # Path to save current run logs export LOG_DIR="${SAVE_PATH}/tb_logs" ``` ⏰ Attention: ```bash export DEV_MODE=0 # Set to 1 for debug mode on single dev machine ``` ### Evaluation The integrated multimodal mathematics dataset can be downloaded from 🤗[datasets](https://huggingface.co/datasets/internlm/Spark-Data) and evaluated using the scripts provided in the `Evaluation` folder. The evaluation results will be stored, and accuracy can subsequently be computed with the `calculate_acc.py` file. ```bash bash ./Evaluation/eval_spark_vl_7b.sh python calculate_acc.py --result_path ./your_result_path.json ``` ## ✒️Citation ```bibtex @misc{liu2025spark, title={SPARK: Synergistic Policy And Reward Co-Evolving Framework}, author={Ziyu Liu and Yuhang Zang and Shengyuan Ding and Yuhang Cao and Xiaoyi Dong and Haodong Duan and Dahua Lin and Jiaqi Wang}, year={2025}, eprint={2509.22624}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2509.22624}, } ``` ## 📄 License **Usage and License Notices**: The data and code are intended and licensed for research use only. License: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use ## Acknowledgement We sincerely thank projects [lmm-r1](https://github.com/TideDra/lmm-r1) and [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) for providing their open-source resources.

<p align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/63859cf3b2906edaf83af9f0/FGS454laRCGTIAzgrbGdG.png" alt="logo" width="200"> </p> # Spark-Data [论文](https://huggingface.co/papers/2509.22624) | [GitHub仓库](https://github.com/InternLM/Spark) | [模型](https://huggingface.co/internlm/Spark-VL-7B) ## 数据简介 本仓库存储了用于训练 🤗[Spark-VL-7B](https://huggingface.co/internlm/Spark-VL-7B) 与 Spark-VL-32B 的数据集,以及收录于《SPARK:协同策略与奖励协同进化框架(Synergistic Policy And Reward Co-Evolving Framework)》[论文](https://huggingface.co/papers/2509.22624)中的多组数学基准测试集合集。 `infer_data_ViRL_19k_h.json` 用于训练 Spark-VL-7B。 `infer_data_ViRL_hard_24k_h.json` 用于训练 Spark-VL-32B。 `benchmark_combine.json` 与 `benchmark_combine_v2.json` 为多组数学基准测试集的合集。 本训练数据集源自 🤗[ViRL-39k](https://huggingface.co/datasets/TIGER-Lab/ViRL39K),我们对其格式进行了调整以适配自身训练框架。 ⭐ 若您认为我们的代码或模型对您有所帮助,欢迎为我们点亮Star——您的支持对我们意义重大! ## 📢 最新动态 - 🚀 [2025/09/29] 我们正式发布 **Spark** 相关的 📖[研究论文](https://arxiv.org/abs/2509.22624)。 - 🚀 [2025/09/29] 我们上传了评估代码与 🤗[模型权重](https://huggingface.co/internlm/Spark-VL-7B)。 - 🚀 [2025/09/29] 我们正式开源 **Spark** 🏠[GitHub仓库](https://github.com/InternLM/Spark)。 ## 💡 核心亮点 - 🔥 **协同策略与奖励协同进化框架(Synergistic Policy–Reward Co-Evolving Framework,简称SPARK)**:我们提出了SPARK这一统一的强化微调框架,通过同策略协同进化在单一模型内同时优化策略与奖励模块。 - 🔥 **轨迹循环利用**:与传统强化学习流程在策略更新后丢弃轨迹不同,SPARK将RLVR轨迹循环转化为逐点、成对与反思类训练目标,使模型可同时充当高性能策略模型与生成式奖励模型。 - 🔥 **协同进化机制**:更精准的奖励评估可为策略学习提供更优质的梯度信号,而更强的推理能力又能进一步优化奖励判断,形成正向反馈循环,协同提升模型的推理、判断与反思能力。 - 🔥 **高效实用**:SPARK无需人工偏好数据、教师模型或外部奖励模型(RM),相较于传统基于奖励模型的强化学习流程,在数据与计算资源消耗上均大幅降低。 ## ⚙️ 训练框架 **SPARK**提出了一种统一的强化学习框架,可在单一模型内实现策略与奖励的协同进化。传统强化学习流程要么依赖外部奖励模型(**人类反馈强化学习(RLHF)**),要么直接丢弃可验证奖励(**RLVR**)。与之相反,SPARK通过循环利用可验证奖励来引导同策略奖励与反思数据的生成: 该设计使模型可同时充当**高性能策略模型与生成式奖励模型**。通过同策略协同进化,SPARK形成了正向反馈循环:**更精准的奖励评估可提供更强的策略梯度,而更出色的推理能力又能进一步优化奖励判断**。 最终,SPARK不仅可同时提升模型的推理与判断能力,还能在测试阶段解锁自我反思能力,使其在各类任务中均可实现更稳定、更具泛化性的性能表现。 <a href=""> <img src="https://github.com/InternLM/Spark/blob/main/assets/framework.png" alt="Framework" > </a> ## 示例使用 本数据集可用于SPARK系列模型的训练与评估。下文将介绍如何使用训练完成的模型进行推理,以及如何配置训练流程。 ### 🛠️ 环境配置 bash git clone https://github.com/InternLM/Spark.git conda create -n Lmm_xc python=3.10 conda activate Visual-RFT cd /Spark/Lmm_XC pip install -e .[vllm] pip install flash_attn --no-build-isolation 本项目基于LMM-R1项目修改而来,安装流程可参考LMM-R1官方说明文档。 ### 推理 我们已上传 **Spark-VL-7B** 模型权重([🤗Hugging Face](https://huggingface.co/internlm/Spark-VL-7B)),您可使用该模型在多模态数学基准测试集与奖励相关基准测试集上评估推理性能。 需要注意的是,在训练过程中,我们会在输入末尾添加以下提示词以方便答案提取。因此在测试阶段,建议您同样在输入末尾添加该提示词: Please first conduct reasoning, and then answer the question. Repeat the final answer using a '\boxed{}'. #### 🤗 使用 Transformers 库 我们的模型基于Qwen2.5-VL-7B-Instruct开发,您可使用与Qwen2.5-VL-7B-Instruct模型相同的代码进行推理,详情请参考 [🤗Hugging Face](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)。 python from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor from qwen_vl_utils import process_vision_info model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "internlm/Spark-VL-7B", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map="auto", ) processor = AutoProcessor.from_pretrained("internlm/Spark-VL-7B") messages = [ { "role": "user", "content": [ { "type": "image", "image": image_path, }, {"type": "text", "text": prompt}, ], } ] # 推理准备 text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda") # 推理:生成输出 generated_ids = model.generate(**inputs, max_new_tokens=128) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) #### 🔦 使用 vLLM 我们推荐使用 **vLLM** 以获得更快的推理速度。使用vLLM可大幅提升数据集评估的效率。 bash PORT=8019 N_PROC=256 SERVE_NAME=spark_vl_7b MODEL_PATH=/internlm/Spark-VL-7B CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "$MODEL_PATH" --tensor-parallel-size 4 --served-model-name $SERVE_NAME --port $PORT --max-num-seqs $N_PROC ### 训练 #### Spark 训练流程 下载数据集后,您可通过以下示例bash脚本启动训练。我们提供的bash脚本位于 `/Spark/Lmm_XC/XC/scripts/spark_training` 目录下。您需要根据自身环境修改数据集路径与模型路径。 bash export WORKSPACE_DIR="/fs-computility/....../Lmm_XC" # 项目根目录路径 export DATASET_PATH="/fs-computility/....../infer_data_ViRL_19k.json" # 数据集路径 export PRETRAIN_MODEL_PATH="/fs-computility/....../Qwen2.5-VL-7B-Instruct" # 预训练模型路径 export WANDB_PROJECT="Observation" # 项目名称 export MODEL_CPK_NAME="Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2" # 本次训练任务名称 export LOG_PATH='/fs-computility/....../Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2.txt' #日志文件保存路径 export WANDB_API_KEY="......" export SAVE_PATH="/fs-computility/....../${WANDB_PROJECT}/${MODEL_CPK_NAME}" # 本次训练的所有文件保存绝对路径 export CKPT_PATH="${SAVE_PATH}/ckpt" # 模型 checkpoint 保存路径 export FINAL_CKPT_PATH="${SAVE_PATH}/final_ckpt" # 最终模型 checkpoint 保存路径 export TIMESTAMP=$(date +%Y%m%d_%H%M%S) # 时间戳 export CUR_LOG_DIR="${SAVE_PATH}/training_logs/${TIMESTAMP}" # 当前训练日志保存路径 export LOG_DIR="${SAVE_PATH}/tb_logs" ⏰ 注意事项: bash export DEV_MODE=0 # 设置为1则开启单开发机调试模式 ### 评估 集成后的多模态数学数据集可从 🤗[数据集仓库](https://huggingface.co/datasets/internlm/Spark-Data) 下载,并使用 `Evaluation` 目录下提供的脚本执行评估。评估结果将被保存,随后可通过 `calculate_acc.py` 文件计算模型准确率。 bash bash ./Evaluation/eval_spark_vl_7b.sh python calculate_acc.py --result_path ./your_result_path.json ## ✒️引用 bibtex @misc{liu2025spark, title={SPARK: Synergistic Policy And Reward Co-Evolving Framework}, author={Ziyu Liu and Yuhang Zang and Shengyuan Ding and Yuhang Cao and Xiaoyi Dong and Haodong Duan and Dahua Lin and Jiaqi Wang}, year={2025}, eprint={2509.22624}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2509.22624}, } ## 📄 许可证 **使用与许可声明**:本数据集与代码仅用于学术研究用途。许可证类型:署名-非商业性使用4.0国际(CC BY-NC 4.0)。使用时需遵守OpenAI官方政策:https://openai.com/policies/terms-of-use ## 致谢 我们衷心感谢项目 [lmm-r1](https://github.com/TideDra/lmm-r1) 与 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) 开源贡献的资源。
提供机构:
maas
创建时间:
2025-10-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作