JetBrains-Research/PandasPlotBench
收藏Hugging Face2024-12-09 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/JetBrains-Research/PandasPlotBench
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: int64
- name: code_plot
dtype: string
- name: code_data
dtype: string
- name: data_csv
dtype: string
- name: task__plot_description
dtype: string
- name: task__plot_style
dtype: string
- name: plots_gt
sequence: string
- name: _task__plot_description_short
dtype: string
- name: _task__plot_description_short_single
dtype: string
splits:
- name: test
num_bytes: 36342660
num_examples: 175
download_size: 25693018
dataset_size: 36342660
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
license: apache-2.0
---
---
---
# PandasPlotBench
PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame.
🛠️ **Task**. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot.
The dataset is based on the [MatPlotLib gallery](https://matplotlib.org/stable/gallery/index.html).
The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1.
To score your model on this dataset, you can use the [our GitHub repository](https://github.com/JetBrains-Research/PandasPlotBench).
📩 If you have any questions or requests concerning this dataset, please contact the author at [timur.galimzyanov@jetbrains.com](mailto:timur.galimzyanov@jetbrains.com).
## How-to
### Loading the data
via [`load_dataset`](https://huggingface.co/docs/datasets/v3.1.0/en/package_reference/loading_methods#datasets.load_dataset):
```
from datasets import load_dataset
dataset = load_dataset("JetBrains-Research/plot_bench", split="test")
```
Note that all of our data is considered to be in the test split.
### Usage
You can find the benchmark code in [our GitHub repository](https://github.com/JetBrains-Research/PandasPlotBench).
### Scoring
We use the *LLM as a Judge* approach to score the results in two ways:
- **Visual scoring**. The Judge model is asked to compare two images on a scale of 0 to 100, focusing on the main idea of the images while ignoring styling details.
- **Task-based scoring**. The Judge model is asked to score the adherence of the resulting plot to the task description.
### Datapoint Schema
Each example has the following fields:
| Field | Description |
|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| `id` | Unique ID of the datapoint. |
| `code_plot` | Ground truth code that plots the `df` DataFrame. Our basic data points use `matplotlib`. |
| `code_data` | Code for loading the data. For the majority of the data points, it is ```import pandas as pd; df = pd.read_csv("data.csv")``` |
| `data_csv` | CSV data to be plotted. Its content is saved to the dataset folder during benchmarking. |
| `task__plot_description` | Main description of the plot (*i.e.*, the task). |
| `task__plot_style` | Description of the style of the plot. |
| `_task__plot_description_short` | Synthetically shortened plot description (2-3 sentences). |
| `_task__plot_description_short_single` | Synthetically shortened plot description (1 sentence). |
| `plots_gt` | list of encoded (`base64.b64encode(image_file.read()).decode("utf-8")`) ground truth plots. Usually, this is a single plot. |
dataset_info:
数据集信息:
特征:
- 名称:id,数据类型:int64
- 名称:code_plot,数据类型:字符串
- 名称:code_data,数据类型:字符串
- 名称:data_csv,数据类型:字符串
- 名称:task__plot_description,数据类型:字符串
- 名称:task__plot_style,数据类型:字符串
- 名称:plots_gt,数据类型:字符串序列
- 名称:_task__plot_description_short,数据类型:字符串
- 名称:_task__plot_description_short_single,数据类型:字符串
划分:
- 名称:test,字节数:36342660,样本数量:175
下载大小:25693018,数据集总大小:36342660
配置项:
- 配置名称:default,数据文件:
- 划分:test,路径:data/test-*
许可证:Apache-2.0
---
---
---
# PandasPlotBench 基准数据集
PandasPlotBench是一款用于评估模型在给定Pandas数据框(Pandas DataFrame)描述的情况下,编写可视化代码能力的基准测试集。
🛠️ **任务**:给定绘图任务与Pandas数据框描述,编写生成可视化图表的代码。
本数据集基于[MatPlotLib 图库](https://matplotlib.org/stable/gallery/index.html)构建。
相关论文可在arXiv平台获取:https://arxiv.org/abs/2412.02764v1。
若需在该数据集上对模型进行评分,可使用[我们的GitHub仓库](https://github.com/JetBrains-Research/PandasPlotBench)。
📩 若您对本数据集有任何疑问或需求,请联系作者:[timur.galimzyanov@jetbrains.com](mailto:timur.galimzyanov@jetbrains.com)。
## 使用指南
### 数据加载
通过[`load_dataset`](https://huggingface.co/docs/datasets/v3.1.0/en/package_reference/loading_methods#datasets.load_dataset)加载数据:
from datasets import load_dataset
dataset = load_dataset("JetBrains-Research/plot_bench", split="test")
注:本数据集的全部数据均包含于`test`划分中。
### 代码使用
基准测试代码可在[我们的GitHub仓库](https://github.com/JetBrains-Research/PandasPlotBench)中获取。
### 评分方式
我们采用**大语言模型(Large Language Model,简称LLM)作为评判者**的方法,从两个维度对结果进行评分:
- **可视化评分**:要求评判模型对两张图片进行0至100分的评分,重点关注图表的核心主题,忽略样式细节。
- **任务匹配评分**:要求评判模型对生成的图表与任务描述的契合度进行评分。
## 数据点结构
每条数据样本包含以下字段:
| 字段名 | 字段说明 |
|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| `id` | 数据样本的唯一标识符。 |
| `code_plot` | 用于绘制`df`数据框的基准代码,本数据集的基础样本均使用`matplotlib`编写。 |
| `code_data` | 数据加载代码。对于绝大多数样本,该代码为`import pandas as pd; df = pd.read_csv("data.csv")`。 |
| `data_csv` | 待绘制的CSV格式数据。在基准测试过程中,其内容会被保存至数据集文件夹中。 |
| `task__plot_description` | 图表的核心描述(即任务本身)。 |
| `task__plot_style` | 图表的样式描述。 |
| `_task__plot_description_short` | 人工合成的精简版图表描述(2-3个句子)。 |
| `_task__plot_description_short_single` | 人工合成的单句精简版图表描述。 |
| `plots_gt` | 经过Base64编码(采用`base64.b64encode(image_file.read()).decode("utf-8")`格式)的基准图表列表,通常仅包含单张图表。 |
提供机构:
JetBrains-Research



