LeonMiao/CRUX

Name: LeonMiao/CRUX
Creator: LeonMiao
Published: 2026-04-10 04:01:19
License: 暂无描述

Hugging Face2026-04-10 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/LeonMiao/CRUX

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - image-text-to-text language: - en tags: - agent size_categories: - 100K<n<1M configs: - config_name: visualization data_files: - split: visualization path: "visualization.parquet" dataset_info: - config_name: visualization features: - name: query_txt dtype: string - name: query_img dtype: image - name: CRU_1_txt dtype: string - name: CRU_1_img dtype: image - name: CRU_2_txt dtype: string - name: CRU_2_img dtype: image - name: CRU_3_txt dtype: string - name: CRU_3_img dtype: image - name: CRU_4_txt dtype: string - name: CRU_4_img dtype: image - name: CRU_5_txt dtype: string - name: CRU_5_img dtype: image - name: CRU_6_txt dtype: string - name: CRU_6_img dtype: image - name: answer dtype: string --- <div align="center"> <img src="docs/logo.png" alt="logo" height="150"> <h1 style="font-size: 32px; font-weight: bold;"> [CVPR 2026] VɪRC: Enhancing Visual Interleaved Mathematical CoT with Reason Chunking </h1> <a href="https://arxiv.org/abs/2512.14654v3"><img src="https://img.shields.io/badge/ArXiv-ViRC-brown?logo=arxiv" alt="Paper"></a> <a href="https://huggingface.co/datasets/LeonMiao/CRUX"><img src="https://img.shields.io/badge/🤗 huggingface-CRUX-blue" alt="dataset"></a> <br> <a href="https://huggingface.co/LeonMiao/ViRC-7B"><img src="https://img.shields.io/badge/🤗 huggingface-ViRC--7B-purple" alt="checkpoint"></a> <a href="https://huggingface.co/LeonMiao/ViRC-3B"><img src="https://img.shields.io/badge/🤗 huggingface-ViRC--3B-purple" alt="checkpoint"></a> <a href="https://huggingface.co/LeonMiao/ViRC-Qwen2VL-7B"><img src="https://img.shields.io/badge/🤗 huggingface-ViRC--Qwen2VL--7B-purple" alt="checkpoint"></a> <a href="https://huggingface.co/LeonMiao/ViRC-Qwen2VL-2B"><img src="https://img.shields.io/badge/🤗 huggingface-ViRC--Qwen2VL--2B-purple" alt="checkpoint"></a> </div> ## 💥 News - **[2026.04.09]** We officially open-source the **CRUX dataset** and **ViRC models** as scheduled. 🚀 - **[2026.03.01]** The **CURX dataset** and **ViRC models** are ready and are currently **under internal review at Ant Group**. Full open-sourcing will be completed **no later than 2026.04.19**.⚡️ - **[2026.02.21]** ViRC has been accepted by **CVPR 2026**. 🎉 - **[2025.12.16]** We release the [arxiv paper](https://arxiv.org/abs/2512.14654v3) and the code. 🔥 ## 👀 About VɪRC Existing MLLMs typically perform textual reasoning solely from a single static mathematical image, overlooking dynamic visual acquisition during reasoning. In contrast, humans repeatedly examine visual image and employ step-by-step reasoning to prove intermediate propositions. We propose a **ViRC** framework for multimodal mathematical tasks, introducing a **Reason Chunking** mechanism that structures multimodal mathematical CoT into consecutive **C**ritical **R**easoning **U**nits (**CRU**s) to simulate human expert problem-solving patterns. CRUs ensure intra-unit textual coherence for intermediate proposition verification while integrating visual information across units to generate subsequent propositions and support structured reasoning. <p align="center"> <img src="docs/teaser.png" width="40%"> <br> </p> To this end, we present **CRUX dataset** by using **three visual tools** and **four reasoning patterns** to provide explicitly annotated CRUs across multiple reasoning paths for each mathematical problem. <p align="center"> <img src="docs/data_pipeline.png"> <br> </p> Leveraging the CRUX dataset, we propose a progressive training strategy inspired by human cognitive learning, which includes Instructional SFT, Practice SFT, and Strategic RL, aimed at further strengthening the Reason Chunking ability of the model. <p align="center"> <img src="docs/training_strategy.png" width="40%"> <br> </p> The resulting ViRC-7B model achieves a 18.8% average improvement over baselines across multiple mathematical benchmarks and cross-domain high-resolution image benchmarks. ## 💪 Get Started ### Installation Clone the repository: ```bash git clone https://github.com/Leon-LihongWang/ViRC.git cd ViRC ``` Create and activate a conda environment: ```bash conda create -n virc python=3.10 -y conda activate virc ``` Install additional dependencies: ```bash bash src/setup.sh ``` ### 🚀 Training ### Preparation Download our [dataset](https://huggingface.co/datasets/LeonMiao/ViRC-100K), and extract `ViRC_images.tar.lz4`: ```bash huggingface-cli repo download --repo-type dataset LeonMiao/CRUX --local-dir ./data cd ./data && lz4 -d ViRC_images.tar.lz4 | tar -xf - ``` Download [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), which is the base model used for training. ```bash huggingface-cli repo download --repo-type model Qwen/Qwen2.5-VL-7B-Instruct --local-dir ./model ``` #### Stage 1: Instructional SFT ```bash cd ViRC pip install datasets==3.6.0 DISABLE_VERSION_CHECK=1 llamafactory-cli train src/train/stage_1_InstrSFT.yaml ``` #### Stage 2: Practice SFT ```bash DISABLE_VERSION_CHECK=1 llamafactory-cli train src/train/stage_2_PracSFT.yaml ``` #### Stage 3: Strategic RL ```bash pip install datasets==4.0.0 # Start Qwen2.5-VL-72B-Instruct with vLLM, and update Lines 32, 36, and 37 accordingly (the model-related settings). \cp -f src/train/reward_for_scale_dynamic.py ./verl/utils/reward_score/init.py bash src/train/stage_3_StratRL ``` #### Notes on Training Config - We provide `src/train/merge_rl_result.sh` to merge Strategic RL outputs and export the final model weights in **safetensors** format. - Our training data and scripts support **two image-resolution settings**: - **Dynamic resolution** (for models such as **Qwen2.5VL**). - **Fixed resolution** (images resized to **1000×1000**, for models such as **Qwen2VL/Qwen3VL**). These are distinguished by the suffixes `_scale_dynamic` and `_scale_fixed` under `data/` and `src/`. - For reproducibility, we release a **50K subset** of the training data (suffix `_50K`) and the **full 100K set** (suffix `_100K`). ### 💫 Inference We provide inference scripts for **two image-resolution settings**: - **Dynamic resolution** (`src/evaluation/ViRC_scale_dynamic.py`): for [ViRC-7B](https://huggingface.co/LeonMiao/ViRC-7B) and [ViRC-3B](https://huggingface.co/LeonMiao/ViRC-3B), based on the **Qwen2.5-VL-Instruct** series. - **Fixed resolution (1000×1000)** (`src/evaluation/ViRC_scale_fixed.py`): for [ViRC-Qwen2VL-7B](https://huggingface.co/LeonMiao/ViRC-Qwen2VL-7B) and [ViRC-Qwen2VL-2B](https://huggingface.co/LeonMiao/ViRC-Qwen2VL-2B), based on the **Qwen2-VL-Instruct** series. `src/evaluation/` also includes a sample input image (`image.png`) and an expected output example in `src/evaluation/response/` for a quick sanity check. Start the ViRC model with **vLLM**, then run the evaluation script: ```bash model_path=./ViRC/models/ViRC-7B model_name=ViRC tensor_parallel_size=4 port=8000 python -u -m vllm.entrypoints.openai.api_server \ --model $model_path \ --served-model-name $model_name \ --dtype auto \ --tensor-parallel-size $tensor_parallel_size \ --gpu-memory-utilization 0.9 \ --port $port python src/evaluation/ViRC_scale_dynamic.py ``` ### 🥳 Acknowledgements We would like to thank [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) and [verl](https://github.com/verl-project/verl), upon which our repo is built. ## ✅ Citation ``` @article{wang2025virc, title={ViRC: Enhancing Visual Interleaved Mathematical CoT with Reason Chunking}, author={Lihong, Wang and Liangqi, Li and Weiwei, Feng and Jiamin, Wu and Changtao, Miao and Tieru, Wu and Rui, Ma and Bo, Zhang and Zhe, Li}, journal={arXiv preprint arXiv:2512.14654}, year={2025} } ```

提供机构：

LeonMiao

5,000+

优质数据集

54 个

任务类型

进入经典数据集