VisCode-200K

Name: VisCode-200K
Creator: maas
Published: 2025-12-05 16:37:35
License: 暂无描述

魔搭社区2025-12-05 更新2025-06-07 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/VisCode-200K

下载链接

链接失效反馈

官方服务：

资源简介：

# VisCode-200K [🏠 Project Page](https://tiger-ai-lab.github.io/VisCoder) | [💻 GitHub](https://github.com/TIGER-AI-Lab/VisCoder) | [📖 Paper](https://arxiv.org/abs/2506.03930) | [🤗 VisCoder-3B](https://huggingface.co/TIGER-Lab/VisCoder-3B) | [🤗 VisCoder-7B](https://huggingface.co/TIGER-Lab/VisCoder-7B) **VisCode-200K** is a large-scale instruction-tuning dataset for training language models to generate and debug **executable Python visualization code**. ## 🧠 Overview VisCode-200K contains over **200,000** samples for executable Python visualization tasks. Each sample includes a natural language instruction and the corresponding Python code, structured as a `messages` list in ChatML format. We construct VisCode-200K through a scalable pipeline that integrates cleaned plotting code, synthetic instruction generation, runtime validation, and multi-turn dialogue construction. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64de37ee5e192985054be575/rf3nLZvrgQr3NXkeoSTTw.png) ## 📁 Data Format Each sample is a JSON object with the following two keys: ```json { "uuid": "6473df7ef4704da0a218ea71dc2d641b", "messages": [ {"role": "user", "content": "Instruction..."}, {"role": "assistant", "content": "Visualization Python code..."} ] } ``` - `uuid`: A unique identifier for the sample. - `messages`: A list of dialogue turns following format: - The **user** provides a natural language instruction describing a visualization task. - The **assistant** responds with Python code that generates the corresponding plot using a variety of libraries. ## 🧪 Use Cases VisCode-200K is designed for: - 📊 Instruction tuning for Python visualization code generation. - 🔁 Multi-turn self-correction via dialogue with execution feedback. - 🧠 Training models to align natural language, code semantics, and visual outputs. This dataset supports the development of [VisCoder](https://huggingface.co/collections/TIGER-Lab/viscoder-6840333efe87c4888bc93046) models, including [VisCoder-3B](https://huggingface.co/TIGER-Lab/VisCoder-3B) and [VisCoder-7B](https://huggingface.co/TIGER-Lab/VisCoder-7B), evaluated on [PandasPlotBench](https://github.com/TIGER-AI-Lab/VisCoder/tree/main/eval). ## 📖 Citation ```bibtex @article{ni2025viscoder, title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation}, author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu}, journal={arXiv preprint arXiv:2506.03930}, year={2025} } ```

# VisCode-200K [🏠 项目主页](https://tiger-ai-lab.github.io/VisCoder) | [💻 GitHub 仓库](https://github.com/TIGER-AI-Lab/VisCoder) | [📖 研究论文](https://arxiv.org/abs/2506.03930) | [🤗 VisCoder-3B](https://huggingface.co/TIGER-Lab/VisCoder-3B) | [🤗 VisCoder-7B](https://huggingface.co/TIGER-Lab/VisCoder-7B) **VisCode-200K** 是一款大规模指令微调数据集，用于训练大语言模型（Large Language Model，LLM）生成并调试可执行Python可视化代码。 ## 🧠 概述 VisCode-200K包含超过20万个面向可执行Python可视化任务的样本。每个样本均包含一条自然语言指令与对应的Python代码，结构采用ChatML格式的`messages`列表。我们通过整合清洗后的绘图代码、合成指令生成、运行时验证与多轮对话构建的可扩展流水线构建了VisCode-200K数据集。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64de37ee5e192985054be575/rf3nLZvrgQr3NXkeoSTTw.png) ## 📁 数据格式每个样本均为JSON对象，包含以下两个键： json { "uuid": "6473df7ef4704da0a218ea71dc2d641b", "messages": [ {"role": "user", "content": "Instruction..."}, {"role": "assistant", "content": "Visualization Python code..."} ] } - `uuid`：样本的唯一标识符。 - `messages`：遵循以下格式的多轮对话列表： - **用户（user）** 提供描述可视化任务的自然语言指令。 - **助手（assistant）** 回复使用各类绘图库生成对应图表的Python代码。 ## 🧪 应用场景 VisCode-200K专为以下场景设计： - 📊 Python可视化代码生成的指令微调任务。 - 🔁 结合执行反馈的多轮自校正对话。 - 🧠 训练模型对齐自然语言、代码语义与可视化输出的任务。本数据集可用于开发VisCoder系列模型（[VisCoder](https://huggingface.co/collections/TIGER-Lab/viscoder-6840333efe87c4888bc93046)），包括VisCoder-3B与VisCoder-7B，相关评估可通过[PandasPlotBench](https://github.com/TIGER-AI-Lab/VisCoder/tree/main/eval)完成。 ## 📖 引用 bibtex @article{ni2025viscoder, title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation}, author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu}, journal={arXiv preprint arXiv:2506.03930}, year={2025} }

提供机构：

maas

创建时间：

2025-06-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集