VisPlotBench

Name: VisPlotBench
Creator: maas
Published: 2025-12-05 16:55:37
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/VisPlotBench

下载链接

链接失效反馈

官方服务：

资源简介：

# VisPlotBench (A Multi-Language Benchmark for Visualization Coding Agents) [**🌐 Homepage**](https://tiger-ai-lab.github.io/VisCoder2) | [**💻 GitHub**](https://github.com/TIGER-AI-Lab/VisCoder2) | [**📖 Paper**](https://arxiv.org/abs/2510.23642) | [**🤗 VisCoder2**](https://hf.co/collections/TIGER-Lab/viscoder2) --- ## 🔔 News - **🔥 [2025-10-25]** VisPlotBench is released as part of the **VisCoder2** project, providing the first systematic benchmark for multi-language visualization coding agents. - **📦 [2025-10-25]** Evaluation scripts are now available on the [GitHub repository](https://github.com/TIGER-AI-Lab/VisCoder2/tree/main/VisPlotBench). --- ## Dataset Description **VisPlotBench** is a benchmark for evaluating visualization coding agents across **eight programming languages**. Unlike prior efforts that target a single language or chart type, VisPlotBench features **888 executable tasks**, **rendered outputs**, and a standardized **execute–render–score** protocol for both initial generation and multi-round self-debug evaluation. Each task provides: - a **natural-language instruction** describing the visualization goal, - corresponding **reference code** in one of eight supported languages, and - the **rendered reference image** for visual alignment evaluation. ![visplotbench_overview](https://cdn-uploads.huggingface.co/production/uploads/64de37ee5e192985054be575/YK9kZkI5Z38IHVW9P6MiG.png) --- ## Data Construction VisPlotBench combines curated examples from library documentation, high-quality open-source code, and programmatic rendering pipelines. All code snippets are executed in isolated environments to ensure **valid rendering and executability**, and visually trivial outputs are removed. Each task is annotated with a **Visual Category** and **Subtype**, covering **13 categories** such as Bars, Lines, Areas, 3D, Scatter, Hierarchies, Networks & Flows, and Music. Tasks are then extended with a five-component instruction schema: > **Setup → Plot Instruction → Data Instruction → Task Description → Style Description** This ensures consistent structure across languages while preserving language-specific syntax and conventions. --- ## Evaluation Protocol VisPlotBench defines a unified **execute–render–score** evaluation pipeline: 1. **Execution Pass Rate (Exec Pass)** — checks if generated code runs successfully and produces a valid visualization. 2. **Task Score** — assesses instruction compliance using an LLM-based semantic rubric. 3. **Visual Score** — measures perceptual similarity between generated and reference images. The benchmark also supports **multi-round self-debugging**, where models can refine code up to three times using feedback from execution logs, simulating real-world visualization correction loops. --- ## Language Configurations VisPlotBench provides eight separate configurations, each corresponding to a supported visualization language: | Language | #Test Samples | |-----------|---------------| | Python | 196 | | Vega-Lite | 129 | | LilyPond | 55 | | Mermaid | 131 | | SVG | 65 | | LaTeX | 112 | | Asymptote | 92 | | HTML | 108 | Each configuration includes verified executable examples with paired natural-language descriptions and rendered outputs. --- ## Citation ```bibtex @article{ni2025viscoder2, title={VisCoder2: Building Multi-Language Visualization Coding Agents}, author={Ni, Yuansheng and Cai, Songcheng and Chen, Xiangchao and Liang, Jiarong and Lyu, Zhiheng and Deng, Jiaqi and Zou, Kai and Nie, Ping and Yuan, Fei and Yue, Xiang and others}, journal={arXiv preprint arXiv:2510.23642}, year={2025} } @article{ni2025viscoder, title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation}, author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu}, journal={arXiv preprint arXiv:2506.03930}, year={2025} } ```

# VisPlotBench：面向可视化编码智能体的多语言基准测试集 [**🌐 主页**](https://tiger-ai-lab.github.io/VisCoder2) | [**💻 GitHub 仓库**](https://github.com/TIGER-AI-Lab/VisCoder2) | [**📖 论文**](https://arxiv.org/abs/2510.23642) | [**🤗 VisCoder2 数据集集合**](https://hf.co/collections/TIGER-Lab/viscoder2) --- ## 🔔 最新动态 - **🔥 [2025-10-25]** VisPlotBench 作为 **VisCoder2** 项目的一部分正式发布，是首个面向多语言可视化编码智能体的系统化基准测试集。 - **📦 [2025-10-25]** 评估脚本现已上线 [GitHub 仓库](https://github.com/TIGER-AI-Lab/VisCoder2/tree/main/VisPlotBench)。 --- ## 数据集说明 **VisPlotBench** 是一款面向多语言可视化编码智能体的基准测试集，覆盖**8种编程语言**。与此前仅针对单一语言或单一图表类型的相关研究不同，VisPlotBench 包含**888个可执行任务**、**渲染输出结果**，以及一套标准化的**执行-渲染-评分**流程，可支持初始代码生成与多轮自调试评估。每个任务均包含： - 描述可视化目标的**自然语言指令**， - 对应8种支持语言之一的**参考代码**， - 用于视觉对齐评估的**渲染参考图像**。 ![visplotbench_overview](https://cdn-uploads.huggingface.co/production/uploads/64de37ee5e192985054be575/YK9kZkI5Z38IHVW9P6MiG.png) --- ## 数据构建 VisPlotBench 整合了来自库文档、高质量开源代码以及程序化渲染管线的精选示例。所有代码片段均在隔离环境中执行，以确保**渲染有效性与代码可执行性**，同时移除了视觉无差异的输出结果。每个任务均标注有**视觉类别**与**子类型**，覆盖柱状图、折线图、面积图、3D图表、散点图、层级图、网络图与流向图、音乐可视化等**13大类**。随后，所有任务均基于五组件指令框架进行扩展： > **设置说明 → 绘图指令 → 数据说明 → 任务描述 → 风格说明** 该设计确保了不同语言间的结构一致性，同时保留了各语言特有的语法与使用习惯。 --- ## 评估流程 VisPlotBench 定义了一套统一的**执行-渲染-评分**评估流程： 1. **执行通过率 (Exec Pass)**：检查生成的代码能否成功运行并生成有效的可视化结果。 2. **任务得分**：基于大语言模型 (Large Language Model) 构建的语义评判标准，评估生成代码是否符合指令要求。 3. **视觉得分**：衡量生成图像与参考图像之间的感知相似度。该基准测试集同时支持**多轮自调试**，模型可利用执行日志反馈对代码进行最多三轮优化，模拟真实世界中的可视化代码修正流程。 --- ## 语言配置 VisPlotBench 提供8种独立配置，每种对应一种支持的可视化编程语言： | 编程语言 | 测试样本数 | |-----------|---------------| | Python | 196 | | Vega-Lite | 129 | | LilyPond | 55 | | Mermaid | 131 | | SVG | 65 | | LaTeX | 112 | | Asymptote | 92 | | HTML | 108 | 每种配置均包含经过验证的可执行示例，以及配套的自然语言描述与渲染输出结果。 --- ## 引用 bibtex @article{ni2025viscoder2, title={VisCoder2: Building Multi-Language Visualization Coding Agents}, author={Ni, Yuansheng and Cai, Songcheng and Chen, Xiangchao and Liang, Jiarong and Lyu, Zhiheng and Deng, Jiaqi and Zou, Kai and Nie, Ping and Yuan, Fei and Yue, Xiang and others}, journal={arXiv preprint arXiv:2510.23642}, year={2025} } @article{ni2025viscoder, title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation}, author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu}, journal={arXiv preprint arXiv:2506.03930}, year={2025} }

提供机构：

maas

创建时间：

2025-10-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集