Name: JetBrains-Research/PandasPlotBench
Creator: JetBrains-Research
Published: 2024-12-09 14:23:19
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/JetBrains-Research/PandasPlotBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: int64 - name: code_plot dtype: string - name: code_data dtype: string - name: data_csv dtype: string - name: task__plot_description dtype: string - name: task__plot_style dtype: string - name: plots_gt sequence: string - name: _task__plot_description_short dtype: string - name: _task__plot_description_short_single dtype: string splits: - name: test num_bytes: 36342660 num_examples: 175 download_size: 25693018 dataset_size: 36342660 configs: - config_name: default data_files: - split: test path: data/test-* license: apache-2.0 --- --- --- # PandasPlotBench PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ **Task**. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the [MatPlotLib gallery](https://matplotlib.org/stable/gallery/index.html). The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the [our GitHub repository](https://github.com/JetBrains-Research/PandasPlotBench). 📩 If you have any questions or requests concerning this dataset, please contact the author at [timur.galimzyanov@jetbrains.com](mailto:timur.galimzyanov@jetbrains.com). ## How-to ### Loading the data via [`load_dataset`](https://huggingface.co/docs/datasets/v3.1.0/en/package_reference/loading_methods#datasets.load_dataset): ``` from datasets import load_dataset dataset = load_dataset("JetBrains-Research/plot_bench", split="test") ``` Note that all of our data is considered to be in the test split. ### Usage You can find the benchmark code in [our GitHub repository](https://github.com/JetBrains-Research/PandasPlotBench). ### Scoring We use the *LLM as a Judge* approach to score the results in two ways: - **Visual scoring**. The Judge model is asked to compare two images on a scale of 0 to 100, focusing on the main idea of the images while ignoring styling details. - **Task-based scoring**. The Judge model is asked to score the adherence of the resulting plot to the task description. ### Datapoint Schema Each example has the following fields: | Field | Description | |----------------------------------------|---------------------------------------------------------------------------------------------------------------------------| | `id` | Unique ID of the datapoint. | | `code_plot` | Ground truth code that plots the `df` DataFrame. Our basic data points use `matplotlib`. | | `code_data` | Code for loading the data. For the majority of the data points, it is ```import pandas as pd; df = pd.read_csv("data.csv")``` | | `data_csv` | CSV data to be plotted. Its content is saved to the dataset folder during benchmarking. | | `task__plot_description` | Main description of the plot (*i.e.*, the task). | | `task__plot_style` | Description of the style of the plot. | | `_task__plot_description_short` | Synthetically shortened plot description (2-3 sentences). | | `_task__plot_description_short_single` | Synthetically shortened plot description (1 sentence). | | `plots_gt` | list of encoded (`base64.b64encode(image_file.read()).decode("utf-8")`) ground truth plots. Usually, this is a single plot. |

dataset_info: 数据集信息：特征： - 名称：id，数据类型：int64 - 名称：code_plot，数据类型：字符串 - 名称：code_data，数据类型：字符串 - 名称：data_csv，数据类型：字符串 - 名称：task__plot_description，数据类型：字符串 - 名称：task__plot_style，数据类型：字符串 - 名称：plots_gt，数据类型：字符串序列 - 名称：_task__plot_description_short，数据类型：字符串 - 名称：_task__plot_description_short_single，数据类型：字符串划分： - 名称：test，字节数：36342660，样本数量：175 下载大小：25693018，数据集总大小：36342660 配置项： - 配置名称：default，数据文件： - 划分：test，路径：data/test-* 许可证：Apache-2.0 --- --- --- # PandasPlotBench 基准数据集 PandasPlotBench是一款用于评估模型在给定Pandas数据框（Pandas DataFrame）描述的情况下，编写可视化代码能力的基准测试集。 🛠️ **任务**：给定绘图任务与Pandas数据框描述，编写生成可视化图表的代码。本数据集基于[MatPlotLib 图库](https://matplotlib.org/stable/gallery/index.html)构建。相关论文可在arXiv平台获取：https://arxiv.org/abs/2412.02764v1。若需在该数据集上对模型进行评分，可使用[我们的GitHub仓库](https://github.com/JetBrains-Research/PandasPlotBench)。 📩 若您对本数据集有任何疑问或需求，请联系作者：[timur.galimzyanov@jetbrains.com](mailto:timur.galimzyanov@jetbrains.com)。 ## 使用指南 ### 数据加载通过[`load_dataset`](https://huggingface.co/docs/datasets/v3.1.0/en/package_reference/loading_methods#datasets.load_dataset)加载数据： from datasets import load_dataset dataset = load_dataset("JetBrains-Research/plot_bench", split="test") 注：本数据集的全部数据均包含于`test`划分中。 ### 代码使用基准测试代码可在[我们的GitHub仓库](https://github.com/JetBrains-Research/PandasPlotBench)中获取。 ### 评分方式我们采用**大语言模型（Large Language Model，简称LLM）作为评判者**的方法，从两个维度对结果进行评分： - **可视化评分**：要求评判模型对两张图片进行0至100分的评分，重点关注图表的核心主题，忽略样式细节。 - **任务匹配评分**：要求评判模型对生成的图表与任务描述的契合度进行评分。 ## 数据点结构每条数据样本包含以下字段： | 字段名 | 字段说明 | |----------------------------------------|---------------------------------------------------------------------------------------------------------------------------| | `id` | 数据样本的唯一标识符。 | | `code_plot` | 用于绘制`df`数据框的基准代码，本数据集的基础样本均使用`matplotlib`编写。 | | `code_data` | 数据加载代码。对于绝大多数样本，该代码为`import pandas as pd; df = pd.read_csv("data.csv")`。 | | `data_csv` | 待绘制的CSV格式数据。在基准测试过程中，其内容会被保存至数据集文件夹中。 | | `task__plot_description` | 图表的核心描述（即任务本身）。 | | `task__plot_style` | 图表的样式描述。 | | `_task__plot_description_short` | 人工合成的精简版图表描述（2-3个句子）。 | | `_task__plot_description_short_single` | 人工合成的单句精简版图表描述。 | | `plots_gt` | 经过Base64编码（采用`base64.b64encode(image_file.read()).decode("utf-8")`格式）的基准图表列表，通常仅包含单张图表。 |

应用场景：