five

VisCode-200K

收藏
魔搭社区2025-12-05 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/VisCode-200K
下载链接
链接失效反馈
官方服务:
资源简介:
# VisCode-200K [🏠 Project Page](https://tiger-ai-lab.github.io/VisCoder) | [💻 GitHub](https://github.com/TIGER-AI-Lab/VisCoder) | [📖 Paper](https://arxiv.org/abs/2506.03930) | [🤗 VisCoder-3B](https://huggingface.co/TIGER-Lab/VisCoder-3B) | [🤗 VisCoder-7B](https://huggingface.co/TIGER-Lab/VisCoder-7B) **VisCode-200K** is a large-scale instruction-tuning dataset for training language models to generate and debug **executable Python visualization code**. ## 🧠 Overview VisCode-200K contains over **200,000** samples for executable Python visualization tasks. Each sample includes a natural language instruction and the corresponding Python code, structured as a `messages` list in ChatML format. We construct VisCode-200K through a scalable pipeline that integrates cleaned plotting code, synthetic instruction generation, runtime validation, and multi-turn dialogue construction. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64de37ee5e192985054be575/rf3nLZvrgQr3NXkeoSTTw.png) ## 📁 Data Format Each sample is a JSON object with the following two keys: ```json { "uuid": "6473df7ef4704da0a218ea71dc2d641b", "messages": [ {"role": "user", "content": "Instruction..."}, {"role": "assistant", "content": "Visualization Python code..."} ] } ``` - `uuid`: A unique identifier for the sample. - `messages`: A list of dialogue turns following format: - The **user** provides a natural language instruction describing a visualization task. - The **assistant** responds with Python code that generates the corresponding plot using a variety of libraries. ## 🧪 Use Cases VisCode-200K is designed for: - 📊 Instruction tuning for Python visualization code generation. - 🔁 Multi-turn self-correction via dialogue with execution feedback. - 🧠 Training models to align natural language, code semantics, and visual outputs. This dataset supports the development of [VisCoder](https://huggingface.co/collections/TIGER-Lab/viscoder-6840333efe87c4888bc93046) models, including [VisCoder-3B](https://huggingface.co/TIGER-Lab/VisCoder-3B) and [VisCoder-7B](https://huggingface.co/TIGER-Lab/VisCoder-7B), evaluated on [PandasPlotBench](https://github.com/TIGER-AI-Lab/VisCoder/tree/main/eval). ## 📖 Citation ```bibtex @article{ni2025viscoder, title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation}, author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu}, journal={arXiv preprint arXiv:2506.03930}, year={2025} } ```

# VisCode-200K [🏠 项目主页](https://tiger-ai-lab.github.io/VisCoder) | [💻 GitHub 仓库](https://github.com/TIGER-AI-Lab/VisCoder) | [📖 研究论文](https://arxiv.org/abs/2506.03930) | [🤗 VisCoder-3B](https://huggingface.co/TIGER-Lab/VisCoder-3B) | [🤗 VisCoder-7B](https://huggingface.co/TIGER-Lab/VisCoder-7B) **VisCode-200K** 是一款大规模指令微调数据集,用于训练大语言模型(Large Language Model,LLM)生成并调试可执行Python可视化代码。 ## 🧠 概述 VisCode-200K包含超过20万个面向可执行Python可视化任务的样本。每个样本均包含一条自然语言指令与对应的Python代码,结构采用ChatML格式的`messages`列表。 我们通过整合清洗后的绘图代码、合成指令生成、运行时验证与多轮对话构建的可扩展流水线构建了VisCode-200K数据集。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64de37ee5e192985054be575/rf3nLZvrgQr3NXkeoSTTw.png) ## 📁 数据格式 每个样本均为JSON对象,包含以下两个键: json { "uuid": "6473df7ef4704da0a218ea71dc2d641b", "messages": [ {"role": "user", "content": "Instruction..."}, {"role": "assistant", "content": "Visualization Python code..."} ] } - `uuid`:样本的唯一标识符。 - `messages`:遵循以下格式的多轮对话列表: - **用户(user)** 提供描述可视化任务的自然语言指令。 - **助手(assistant)** 回复使用各类绘图库生成对应图表的Python代码。 ## 🧪 应用场景 VisCode-200K专为以下场景设计: - 📊 Python可视化代码生成的指令微调任务。 - 🔁 结合执行反馈的多轮自校正对话。 - 🧠 训练模型对齐自然语言、代码语义与可视化输出的任务。 本数据集可用于开发VisCoder系列模型([VisCoder](https://huggingface.co/collections/TIGER-Lab/viscoder-6840333efe87c4888bc93046)),包括VisCoder-3B与VisCoder-7B,相关评估可通过[PandasPlotBench](https://github.com/TIGER-AI-Lab/VisCoder/tree/main/eval)完成。 ## 📖 引用 bibtex @article{ni2025viscoder, title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation}, author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu}, journal={arXiv preprint arXiv:2506.03930}, year={2025} }
提供机构:
maas
创建时间:
2025-06-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作