DocTron-Hub/VinciCoder-1.6M-SFT
收藏Hugging Face2025-12-05 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/DocTron-Hub/VinciCoder-1.6M-SFT
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- image-text-to-text
language:
- en
tags:
- code-generation
- multimodal
- reinforcement-learning
- vision-language-model
---
# VinciCoder: Unified Multimodal Code Generation Dataset
This repository contains the datasets used for **VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning**, a project that introduces a unified multimodal code generation model. The framework uses a two-stage training approach, comprising a large-scale Supervised Finetuning (SFT) corpus and a Visual Reinforcement Learning (ViRL) dataset. These datasets are designed for tasks involving direct code generation and visual-based code refinement.
**Paper:** [VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning](https://huggingface.co/papers/2511.00391)
**Code:** [https://github.com/DocTron-hub/VinciCoder](https://github.com/DocTron-hub/VinciCoder)
**Project Page (Hugging Face Dataset Collection):** [https://huggingface.co/collections/DocTron-Hub/vincicoder-train-data](https://huggingface.co/collections/DocTron-Hub/vincicoder-train-data)
## Dataset Structure
The VinciCoder project utilizes two main datasets: the SFT Dataset for initial training and the RL Dataset for visual reinforcement learning.

### SFT Dataset
The Supervised Finetuning (SFT) dataset comprises 1.6 million image-code pairs. This dataset is a collection and optimization of existing data from various works, designed for direct code generation and visual-based code refinement.
The dataset integrates data from several multimodal code generation domains:
| Domain | Paper |
| :------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Chart-to-code | [ChartCoder](https://arxiv.org/abs/2501.06598), [MSRL](https://arxiv.org/abs/2508.13587), [VisCodex](https://www.arxiv.org/abs/2508.09945) |
| Web-to-HTML | [Web2Code](https://arxiv.org/abs/2406.20098), [Web2M](https://arxiv.org/abs/2404.06369), [VisCodex](https://www.arxiv.org/abs/2508.09945) |
| Image-to-SVG | [UniSVG](https://arxiv.org/abs/2508.07766), [StarVector](https://arxiv.org/abs/2312.11556) |
| Image-to-Latex | [DaTikZ](https://arxiv.org/abs/2503.11509), [MathCoder-VL](https://arxiv.org/abs/2505.10557) |
| Others | [CoSyn](https://arxiv.org/abs/2502.14846) |
The full SFT dataset is available at: [DocTron-Hub/VinciCoder-1.6M-SFT](https://huggingface.co/datasets/DocTron-Hub/VinciCoder-1.6M-SFT)
### RL Dataset
The Reinforcement Learning (RL) dataset consists of 42,000 data samples collected from five distinct domains. This dataset is utilized with a Visual Reinforcement Learning (ViRL) strategy to improve visual fidelity.
The full RL dataset is available at: [DocTron-Hub/VinciCoder-42k-RL](https://huggingface.co/datasets/DocTron-Hub/VinciCoder-42k-RL)
## Installation
It is recommended to follow the instructions in [ms-swift](https://github.com/modelscope/ms-swift?tab=readme-ov-file#%EF%B8%8F-installation) and [EasyR1](https://github.com/hiyouga/EasyR1?tab=readme-ov-file#installation) to install the necessary environments.
Alternatively, you can install the RL environments by cloning the VinciCoder repository:
```bash
git clone https://github.com/DocTron-hub/VinciCoder.git
cd VinciCoder
pip install -e .
```
## Sample Usage (Training Scripts)
### SFT Stage
The SFT stage utilizes `ms-swift`. Please refer to its official documentation for detailed training instructions.
### RL Stage

The RL stage is based on `EasyR1`. First, modify the configurations in ```./examples/qwen3vl_8b_vincicder.sh``` and review the configuration in ```./examples/reward_function/vincicoder.py```. Then, run the following script:
```bash
bash ./examples/qwen3vl_8b_vincicder.sh
```
## Citation
If you find this work useful, please consider citing our paper:
```bibtex
@misc{zhao2025vincicoderunifyingmultimodalcode,
title={VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning},
author={Xuanle Zhao and Deyang Jiang and Zhixiong Zeng and Lei Chen and Haibo Qiu and Jing Huang and Yufeng Zhong and Liming Zheng and Yilin Cao and Lin Ma},
year={2025},
eprint={2511.00391},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.00391},
}
```
task_categories:
- 图像-文本转文本
language:
- 英语
tags:
- 代码生成
- 多模态
- 强化学习
- 视觉语言模型
---
# VinciCoder:统一化多模态代码生成数据集
本仓库包含用于**VinciCoder:基于从粗到细视觉强化学习的统一多模态代码生成**项目的数据集,该项目提出了一款统一化多模态代码生成模型。其框架采用两阶段训练范式,包含大规模监督微调(Supervised Finetuning, SFT)语料库与视觉强化学习(Visual Reinforcement Learning, ViRL)数据集两类数据,旨在支撑直接代码生成与基于视觉的代码优化两类任务。
**论文:**[VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning](https://huggingface.co/papers/2511.00391)
**代码:**[https://github.com/DocTron-hub/VinciCoder](https://github.com/DocTron-hub/VinciCoder)
**项目页面(Hugging Face 数据集合集):**[https://huggingface.co/collections/DocTron-Hub/vincicoder-train-data](https://huggingface.co/collections/DocTron-Hub/vincicoder-train-data)
## 数据集结构
VinciCoder项目主要使用两类数据集:用于初始训练的监督微调数据集,以及用于视觉强化学习的强化学习数据集。

### 监督微调数据集
本监督微调(SFT)数据集包含160万张图像-代码配对样本。该数据集整合并优化了现有多项研究中的数据,专为直接代码生成与基于视觉的代码优化任务设计。
该数据集覆盖多模态代码生成的多个领域:
| 领域 | 相关文献 |
| :------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 图表转代码 | [ChartCoder](https://arxiv.org/abs/2501.06598), [MSRL](https://arxiv.org/abs/2508.13587), [VisCodex](https://www.arxiv.org/abs/2508.09945) |
| 网页转HTML | [Web2Code](https://arxiv.org/abs/2406.20098), [Web2M](https://arxiv.org/abs/2404.06369), [VisCodex](https://www.arxiv.org/abs/2508.09945) |
| 图像转SVG | [UniSVG](https://arxiv.org/abs/2508.07766), [StarVector](https://arxiv.org/abs/2312.11556) |
| 图像转LaTeX | [DaTikZ](https://arxiv.org/abs/2503.11509), [MathCoder-VL](https://arxiv.org/abs/2505.10557) |
| 其他领域 | [CoSyn](https://arxiv.org/abs/2502.14846) |
完整的SFT数据集可于以下位置获取:[DocTron-Hub/VinciCoder-1.6M-SFT](https://huggingface.co/datasets/DocTron-Hub/VinciCoder-1.6M-SFT)
### 强化学习数据集
本强化学习(RL)数据集包含来自5个不同领域的42000条数据样本,配合视觉强化学习(ViRL)策略使用,用于提升生成代码的视觉保真度。
完整的RL数据集可于以下位置获取:[DocTron-Hub/VinciCoder-42k-RL](https://huggingface.co/datasets/DocTron-Hub/VinciCoder-42k-RL)
## 环境安装
建议按照[ms-swift](https://github.com/modelscope/ms-swift?tab=readme-ov-file#%EF%B8%8F-installation)与[EasyR1](https://github.com/hiyouga/EasyR1?tab=readme-ov-file#installation)的说明安装所需运行环境。
亦可通过克隆VinciCoder仓库安装RL相关环境:
bash
git clone https://github.com/DocTron-hub/VinciCoder.git
cd VinciCoder
pip install -e .
## 示例用法(训练脚本)
### 监督微调阶段
SFT阶段基于`ms-swift`实现,详细训练指南请参阅其官方文档。
### 强化学习阶段

RL阶段基于`EasyR1`实现。首先修改`./examples/qwen3vl_8b_vincicder.sh`中的配置项,并审阅`./examples/reward_function/vincicoder.py`中的奖励函数配置,随后运行以下脚本:
bash
bash ./examples/qwen3vl_8b_vincicder.sh
## 引用方式
若您认为本工作对您有所帮助,请引用我们的论文:
bibtex
@misc{zhao2025vincicoderunifyingmultimodalcode,
title={VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning},
author={Xuanle Zhao and Deyang Jiang and Zhixiong Zeng and Lei Chen and Haibo Qiu and Jing Huang and Yufeng Zhong and Liming Zheng and Yilin Cao and Lin Ma},
year={2025},
eprint={2511.00391},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.00391},
}
提供机构:
DocTron-Hub



