five

VisCode-Multi-679K

收藏
魔搭社区2025-12-05 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/VisCode-Multi-679K
下载链接
链接失效反馈
官方服务:
资源简介:
# VisCode-Multi-679K [🏠 Project Page](https://tiger-ai-lab.github.io/VisCoder2) | [💻 GitHub](https://github.com/TIGER-AI-Lab/VisCoder2) | [📖 Paper](https://arxiv.org/abs/2510.23642) | [🤗 VisPlotBench](https://huggingface.co/datasets/TIGER-Lab/VisPlotBench) | [🤗 VisCoder2 Models](https://huggingface.co/collections/TIGER-Lab/viscoder2) **VisCode-Multi-679K** is a large-scale **supervised instruction-tuning dataset** for training large language models to generate and debug **executable visualization code** across **12 programming languages**. --- ## 🧠 Overview VisCode-Multi-679K contains over **679,000** samples across **twelve programming languages**, including: > Python, LaTeX, HTML, SVG, Vega-Lite, LilyPond, Asymptote, Mermaid, JavaScript, TypeScript, R, and C++. Each example links a natural language instruction to executable visualization code, enabling grounded learning between **language, code, and visual semantics**. ![pipeline](https://cdn-uploads.huggingface.co/production/uploads/64de37ee5e192985054be575/TQecuMISiLHf08Cc9aq0X.png) --- ## 📁 Data Format Each sample is a JSON object with the following three keys: ```json { "uuid": "0071d21907cf4736b8960f07d1483457", "messages": [ {"role": "user", "content": "Instruction..."}, {"role": "assistant", "content": "Visualization code..."} ], "language": "programming language" } ``` - `uuid`: A unique identifier for the sample. - `messages`: A list of dialogue turns following format: - The **user** provides a natural language instruction describing a visualization task. - The **assistant** responds responds with executable code in one of the supported languages. - `language`: The programming language used in the visualization code. ## 🧪 Use Cases VisCode-Multi-679K is designed for: - 📊 Instruction tuning for multi-language visualization code generation. - 🔁 Multi-turn self-correction using execution feedback. - 🧠 Training models to align natural language, code semantics, and rendered outputs. This dataset supports the development of [VisCoder2](https://huggingface.co/collections/TIGER-Lab/viscoder2) models evaluated on [VisPlotBench](https://huggingface.co/datasets/TIGER-Lab/VisPlotBench). ## 📖 Citation ```bibtex @article{ni2025viscoder2, title={VisCoder2: Building Multi-Language Visualization Coding Agents}, author={Ni, Yuansheng and Cai, Songcheng and Chen, Xiangchao and Liang, Jiarong and Lyu, Zhiheng and Deng, Jiaqi and Zou, Kai and Nie, Ping and Yuan, Fei and Yue, Xiang and others}, journal={arXiv preprint arXiv:2510.23642}, year={2025} } @article{ni2025viscoder, title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation}, author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu}, journal={arXiv preprint arXiv:2506.03930}, year={2025} } ```

[🏠 项目主页](https://tiger-ai-lab.github.io/VisCoder2) | [💻 GitHub仓库](https://github.com/TIGER-AI-Lab/VisCoder2) | [📖 论文](https://arxiv.org/abs/2510.23642) | [🤗 VisPlotBench数据集](https://huggingface.co/datasets/TIGER-Lab/VisPlotBench) | [🤗 VisCoder2模型集合](https://huggingface.co/collections/TIGER-Lab/viscoder2) # VisCode-Multi-679K **VisCode-Multi-679K**是一款大规模**监督指令微调数据集(supervised instruction-tuning dataset)**,用于训练大语言模型(Large Language Model,简称LLM)生成并调试跨12种编程语言的可执行可视化代码。 --- ## 🧠 概述 VisCode-Multi-679K包含超过67.9万个样本,覆盖12种编程语言,具体包括: > Python、LaTeX、HTML、SVG、Vega-Lite、LilyPond、Asymptote、Mermaid、JavaScript、TypeScript、R及C++。 每个样本均将自然语言指令与可执行可视化代码进行关联,可实现语言、代码与视觉语义之间的接地学习。 ![pipeline](https://cdn-uploads.huggingface.co/production/uploads/64de37ee5e192985054be575/TQecuMISiLHf08Cc9aq0X.png) --- ## 📁 数据格式 每个样本为包含以下三个键的JSON对象: json { "uuid": "0071d21907cf4736b8960f07d1483457", "messages": [ {"role": "user", "content": "Instruction..."}, {"role": "assistant", "content": "Visualization code..."} ], "language": "programming language" } - `uuid`:样本的唯一标识符。 - `messages`:遵循下述格式的对话轮次列表: - **用户(user)**提供描述可视化任务的自然语言指令。 - **助手(assistant)**以任意支持的编程语言输出可执行代码。 - `language`:可视化代码所使用的编程语言。 ## 🧪 应用场景 VisCode-Multi-679K的设计用途包括: - 📊 面向多语言可视化代码生成的指令微调 - 🔁 基于执行反馈的多轮自我修正 - 🧠 训练模型对齐自然语言、代码语义与渲染输出结果 该数据集可支撑[VisCoder2](https://huggingface.co/collections/TIGER-Lab/viscoder2)模型的开发,相关模型已在[VisPlotBench](https://huggingface.co/datasets/TIGER-Lab/VisPlotBench)数据集上完成评估。 ## 📖 引用 bibtex @article{ni2025viscoder2, title={VisCoder2: Building Multi-Language Visualization Coding Agents}, author={Ni, Yuansheng and Cai, Songcheng and Chen, Xiangchao and Liang, Jiarong and Lyu, Zhiheng and Deng, Jiaqi and Zou, Kai and Nie, Ping and Yuan, Fei and Yue, Xiang and others}, journal={arXiv preprint arXiv:2510.23642}, year={2025} } @article{ni2025viscoder, title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation}, author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu}, journal={arXiv preprint arXiv:2506.03930}, year={2025} }
提供机构:
maas
创建时间:
2025-10-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作