five

DocTron-Hub/VinciCoder-1.6M-SFT

收藏
Hugging Face2025-12-05 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/DocTron-Hub/VinciCoder-1.6M-SFT
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - image-text-to-text language: - en tags: - code-generation - multimodal - reinforcement-learning - vision-language-model --- # VinciCoder: Unified Multimodal Code Generation Dataset This repository contains the datasets used for **VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning**, a project that introduces a unified multimodal code generation model. The framework uses a two-stage training approach, comprising a large-scale Supervised Finetuning (SFT) corpus and a Visual Reinforcement Learning (ViRL) dataset. These datasets are designed for tasks involving direct code generation and visual-based code refinement. **Paper:** [VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning](https://huggingface.co/papers/2511.00391) **Code:** [https://github.com/DocTron-hub/VinciCoder](https://github.com/DocTron-hub/VinciCoder) **Project Page (Hugging Face Dataset Collection):** [https://huggingface.co/collections/DocTron-Hub/vincicoder-train-data](https://huggingface.co/collections/DocTron-Hub/vincicoder-train-data) ## Dataset Structure The VinciCoder project utilizes two main datasets: the SFT Dataset for initial training and the RL Dataset for visual reinforcement learning. ![Data Construction](https://github.com/DocTron-hub/VinciCoder/blob/main/fig/data_construct.png?raw=true) ### SFT Dataset The Supervised Finetuning (SFT) dataset comprises 1.6 million image-code pairs. This dataset is a collection and optimization of existing data from various works, designed for direct code generation and visual-based code refinement. The dataset integrates data from several multimodal code generation domains: | Domain | Paper | | :------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Chart-to-code | [ChartCoder](https://arxiv.org/abs/2501.06598), [MSRL](https://arxiv.org/abs/2508.13587), [VisCodex](https://www.arxiv.org/abs/2508.09945) | | Web-to-HTML | [Web2Code](https://arxiv.org/abs/2406.20098), [Web2M](https://arxiv.org/abs/2404.06369), [VisCodex](https://www.arxiv.org/abs/2508.09945) | | Image-to-SVG | [UniSVG](https://arxiv.org/abs/2508.07766), [StarVector](https://arxiv.org/abs/2312.11556) | | Image-to-Latex | [DaTikZ](https://arxiv.org/abs/2503.11509), [MathCoder-VL](https://arxiv.org/abs/2505.10557) | | Others | [CoSyn](https://arxiv.org/abs/2502.14846) | The full SFT dataset is available at: [DocTron-Hub/VinciCoder-1.6M-SFT](https://huggingface.co/datasets/DocTron-Hub/VinciCoder-1.6M-SFT) ### RL Dataset The Reinforcement Learning (RL) dataset consists of 42,000 data samples collected from five distinct domains. This dataset is utilized with a Visual Reinforcement Learning (ViRL) strategy to improve visual fidelity. The full RL dataset is available at: [DocTron-Hub/VinciCoder-42k-RL](https://huggingface.co/datasets/DocTron-Hub/VinciCoder-42k-RL) ## Installation It is recommended to follow the instructions in [ms-swift](https://github.com/modelscope/ms-swift?tab=readme-ov-file#%EF%B8%8F-installation) and [EasyR1](https://github.com/hiyouga/EasyR1?tab=readme-ov-file#installation) to install the necessary environments. Alternatively, you can install the RL environments by cloning the VinciCoder repository: ```bash git clone https://github.com/DocTron-hub/VinciCoder.git cd VinciCoder pip install -e . ``` ## Sample Usage (Training Scripts) ### SFT Stage The SFT stage utilizes `ms-swift`. Please refer to its official documentation for detailed training instructions. ### RL Stage ![ViRL Strategy](https://github.com/DocTron-hub/VinciCoder/blob/main/fig/virl.png?raw=true) The RL stage is based on `EasyR1`. First, modify the configurations in ```./examples/qwen3vl_8b_vincicder.sh``` and review the configuration in ```./examples/reward_function/vincicoder.py```. Then, run the following script: ```bash bash ./examples/qwen3vl_8b_vincicder.sh ``` ## Citation If you find this work useful, please consider citing our paper: ```bibtex @misc{zhao2025vincicoderunifyingmultimodalcode, title={VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning}, author={Xuanle Zhao and Deyang Jiang and Zhixiong Zeng and Lei Chen and Haibo Qiu and Jing Huang and Yufeng Zhong and Liming Zheng and Yilin Cao and Lin Ma}, year={2025}, eprint={2511.00391}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.00391}, } ```

task_categories: - 图像-文本转文本 language: - 英语 tags: - 代码生成 - 多模态 - 强化学习 - 视觉语言模型 --- # VinciCoder:统一化多模态代码生成数据集 本仓库包含用于**VinciCoder:基于从粗到细视觉强化学习的统一多模态代码生成**项目的数据集,该项目提出了一款统一化多模态代码生成模型。其框架采用两阶段训练范式,包含大规模监督微调(Supervised Finetuning, SFT)语料库与视觉强化学习(Visual Reinforcement Learning, ViRL)数据集两类数据,旨在支撑直接代码生成与基于视觉的代码优化两类任务。 **论文:**[VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning](https://huggingface.co/papers/2511.00391) **代码:**[https://github.com/DocTron-hub/VinciCoder](https://github.com/DocTron-hub/VinciCoder) **项目页面(Hugging Face 数据集合集):**[https://huggingface.co/collections/DocTron-Hub/vincicoder-train-data](https://huggingface.co/collections/DocTron-Hub/vincicoder-train-data) ## 数据集结构 VinciCoder项目主要使用两类数据集:用于初始训练的监督微调数据集,以及用于视觉强化学习的强化学习数据集。 ![数据构建流程](https://github.com/DocTron-hub/VinciCoder/blob/main/fig/data_construct.png?raw=true) ### 监督微调数据集 本监督微调(SFT)数据集包含160万张图像-代码配对样本。该数据集整合并优化了现有多项研究中的数据,专为直接代码生成与基于视觉的代码优化任务设计。 该数据集覆盖多模态代码生成的多个领域: | 领域 | 相关文献 | | :------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 图表转代码 | [ChartCoder](https://arxiv.org/abs/2501.06598), [MSRL](https://arxiv.org/abs/2508.13587), [VisCodex](https://www.arxiv.org/abs/2508.09945) | | 网页转HTML | [Web2Code](https://arxiv.org/abs/2406.20098), [Web2M](https://arxiv.org/abs/2404.06369), [VisCodex](https://www.arxiv.org/abs/2508.09945) | | 图像转SVG | [UniSVG](https://arxiv.org/abs/2508.07766), [StarVector](https://arxiv.org/abs/2312.11556) | | 图像转LaTeX | [DaTikZ](https://arxiv.org/abs/2503.11509), [MathCoder-VL](https://arxiv.org/abs/2505.10557) | | 其他领域 | [CoSyn](https://arxiv.org/abs/2502.14846) | 完整的SFT数据集可于以下位置获取:[DocTron-Hub/VinciCoder-1.6M-SFT](https://huggingface.co/datasets/DocTron-Hub/VinciCoder-1.6M-SFT) ### 强化学习数据集 本强化学习(RL)数据集包含来自5个不同领域的42000条数据样本,配合视觉强化学习(ViRL)策略使用,用于提升生成代码的视觉保真度。 完整的RL数据集可于以下位置获取:[DocTron-Hub/VinciCoder-42k-RL](https://huggingface.co/datasets/DocTron-Hub/VinciCoder-42k-RL) ## 环境安装 建议按照[ms-swift](https://github.com/modelscope/ms-swift?tab=readme-ov-file#%EF%B8%8F-installation)与[EasyR1](https://github.com/hiyouga/EasyR1?tab=readme-ov-file#installation)的说明安装所需运行环境。 亦可通过克隆VinciCoder仓库安装RL相关环境: bash git clone https://github.com/DocTron-hub/VinciCoder.git cd VinciCoder pip install -e . ## 示例用法(训练脚本) ### 监督微调阶段 SFT阶段基于`ms-swift`实现,详细训练指南请参阅其官方文档。 ### 强化学习阶段 ![视觉强化学习策略](https://github.com/DocTron-hub/VinciCoder/blob/main/fig/virl.png?raw=true) RL阶段基于`EasyR1`实现。首先修改`./examples/qwen3vl_8b_vincicder.sh`中的配置项,并审阅`./examples/reward_function/vincicoder.py`中的奖励函数配置,随后运行以下脚本: bash bash ./examples/qwen3vl_8b_vincicder.sh ## 引用方式 若您认为本工作对您有所帮助,请引用我们的论文: bibtex @misc{zhao2025vincicoderunifyingmultimodalcode, title={VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning}, author={Xuanle Zhao and Deyang Jiang and Zhixiong Zeng and Lei Chen and Haibo Qiu and Jing Huang and Yufeng Zhong and Liming Zheng and Yilin Cao and Lin Ma}, year={2025}, eprint={2511.00391}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.00391}, }
提供机构:
DocTron-Hub
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作