VisCode-Multi-679K
收藏魔搭社区2025-12-05 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/VisCode-Multi-679K
下载链接
链接失效反馈官方服务:
资源简介:
# VisCode-Multi-679K
[🏠 Project Page](https://tiger-ai-lab.github.io/VisCoder2) | [💻 GitHub](https://github.com/TIGER-AI-Lab/VisCoder2) | [📖 Paper](https://arxiv.org/abs/2510.23642) | [🤗 VisPlotBench](https://huggingface.co/datasets/TIGER-Lab/VisPlotBench) | [🤗 VisCoder2 Models](https://huggingface.co/collections/TIGER-Lab/viscoder2)
**VisCode-Multi-679K** is a large-scale **supervised instruction-tuning dataset** for training large language models to generate and debug **executable visualization code** across **12 programming languages**.
---
## 🧠 Overview
VisCode-Multi-679K contains over **679,000** samples across **twelve programming languages**, including:
> Python, LaTeX, HTML, SVG, Vega-Lite, LilyPond, Asymptote, Mermaid, JavaScript, TypeScript, R, and C++.
Each example links a natural language instruction to executable visualization code, enabling grounded learning between **language, code, and visual semantics**.

---
## 📁 Data Format
Each sample is a JSON object with the following three keys:
```json
{
"uuid": "0071d21907cf4736b8960f07d1483457",
"messages": [
{"role": "user", "content": "Instruction..."},
{"role": "assistant", "content": "Visualization code..."}
],
"language": "programming language"
}
```
- `uuid`: A unique identifier for the sample.
- `messages`: A list of dialogue turns following format:
- The **user** provides a natural language instruction describing a visualization task.
- The **assistant** responds responds with executable code in one of the supported languages.
- `language`: The programming language used in the visualization code.
## 🧪 Use Cases
VisCode-Multi-679K is designed for:
- 📊 Instruction tuning for multi-language visualization code generation.
- 🔁 Multi-turn self-correction using execution feedback.
- 🧠 Training models to align natural language, code semantics, and rendered outputs.
This dataset supports the development of [VisCoder2](https://huggingface.co/collections/TIGER-Lab/viscoder2) models evaluated on [VisPlotBench](https://huggingface.co/datasets/TIGER-Lab/VisPlotBench).
## 📖 Citation
```bibtex
@article{ni2025viscoder2,
title={VisCoder2: Building Multi-Language Visualization Coding Agents},
author={Ni, Yuansheng and Cai, Songcheng and Chen, Xiangchao and Liang, Jiarong and Lyu, Zhiheng and Deng, Jiaqi and Zou, Kai and Nie, Ping and Yuan, Fei and Yue, Xiang and others},
journal={arXiv preprint arXiv:2510.23642},
year={2025}
}
@article{ni2025viscoder,
title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation},
author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu},
journal={arXiv preprint arXiv:2506.03930},
year={2025}
}
```
[🏠 项目主页](https://tiger-ai-lab.github.io/VisCoder2) | [💻 GitHub仓库](https://github.com/TIGER-AI-Lab/VisCoder2) | [📖 论文](https://arxiv.org/abs/2510.23642) | [🤗 VisPlotBench数据集](https://huggingface.co/datasets/TIGER-Lab/VisPlotBench) | [🤗 VisCoder2模型集合](https://huggingface.co/collections/TIGER-Lab/viscoder2)
# VisCode-Multi-679K
**VisCode-Multi-679K**是一款大规模**监督指令微调数据集(supervised instruction-tuning dataset)**,用于训练大语言模型(Large Language Model,简称LLM)生成并调试跨12种编程语言的可执行可视化代码。
---
## 🧠 概述
VisCode-Multi-679K包含超过67.9万个样本,覆盖12种编程语言,具体包括:
> Python、LaTeX、HTML、SVG、Vega-Lite、LilyPond、Asymptote、Mermaid、JavaScript、TypeScript、R及C++。
每个样本均将自然语言指令与可执行可视化代码进行关联,可实现语言、代码与视觉语义之间的接地学习。

---
## 📁 数据格式
每个样本为包含以下三个键的JSON对象:
json
{
"uuid": "0071d21907cf4736b8960f07d1483457",
"messages": [
{"role": "user", "content": "Instruction..."},
{"role": "assistant", "content": "Visualization code..."}
],
"language": "programming language"
}
- `uuid`:样本的唯一标识符。
- `messages`:遵循下述格式的对话轮次列表:
- **用户(user)**提供描述可视化任务的自然语言指令。
- **助手(assistant)**以任意支持的编程语言输出可执行代码。
- `language`:可视化代码所使用的编程语言。
## 🧪 应用场景
VisCode-Multi-679K的设计用途包括:
- 📊 面向多语言可视化代码生成的指令微调
- 🔁 基于执行反馈的多轮自我修正
- 🧠 训练模型对齐自然语言、代码语义与渲染输出结果
该数据集可支撑[VisCoder2](https://huggingface.co/collections/TIGER-Lab/viscoder2)模型的开发,相关模型已在[VisPlotBench](https://huggingface.co/datasets/TIGER-Lab/VisPlotBench)数据集上完成评估。
## 📖 引用
bibtex
@article{ni2025viscoder2,
title={VisCoder2: Building Multi-Language Visualization Coding Agents},
author={Ni, Yuansheng and Cai, Songcheng and Chen, Xiangchao and Liang, Jiarong and Lyu, Zhiheng and Deng, Jiaqi and Zou, Kai and Nie, Ping and Yuan, Fei and Yue, Xiang and others},
journal={arXiv preprint arXiv:2510.23642},
year={2025}
}
@article{ni2025viscoder,
title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation},
author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu},
journal={arXiv preprint arXiv:2506.03930},
year={2025}
}
提供机构:
maas
创建时间:
2025-10-29



