five

JetBrains-Research/PandasPlotBench

收藏
Hugging Face2024-12-09 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/JetBrains-Research/PandasPlotBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: id dtype: int64 - name: code_plot dtype: string - name: code_data dtype: string - name: data_csv dtype: string - name: task__plot_description dtype: string - name: task__plot_style dtype: string - name: plots_gt sequence: string - name: _task__plot_description_short dtype: string - name: _task__plot_description_short_single dtype: string splits: - name: test num_bytes: 36342660 num_examples: 175 download_size: 25693018 dataset_size: 36342660 configs: - config_name: default data_files: - split: test path: data/test-* license: apache-2.0 --- --- --- # PandasPlotBench PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ **Task**. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the [MatPlotLib gallery](https://matplotlib.org/stable/gallery/index.html). The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the [our GitHub repository](https://github.com/JetBrains-Research/PandasPlotBench). 📩 If you have any questions or requests concerning this dataset, please contact the author at [timur.galimzyanov@jetbrains.com](mailto:timur.galimzyanov@jetbrains.com). ## How-to ### Loading the data via [`load_dataset`](https://huggingface.co/docs/datasets/v3.1.0/en/package_reference/loading_methods#datasets.load_dataset): ``` from datasets import load_dataset dataset = load_dataset("JetBrains-Research/plot_bench", split="test") ``` Note that all of our data is considered to be in the test split. ### Usage You can find the benchmark code in [our GitHub repository](https://github.com/JetBrains-Research/PandasPlotBench). ### Scoring We use the *LLM as a Judge* approach to score the results in two ways: - **Visual scoring**. The Judge model is asked to compare two images on a scale of 0 to 100, focusing on the main idea of the images while ignoring styling details. - **Task-based scoring**. The Judge model is asked to score the adherence of the resulting plot to the task description. ### Datapoint Schema Each example has the following fields: | Field | Description | |----------------------------------------|---------------------------------------------------------------------------------------------------------------------------| | `id` | Unique ID of the datapoint. | | `code_plot` | Ground truth code that plots the `df` DataFrame. Our basic data points use `matplotlib`. | | `code_data` | Code for loading the data. For the majority of the data points, it is ```import pandas as pd; df = pd.read_csv("data.csv")``` | | `data_csv` | CSV data to be plotted. Its content is saved to the dataset folder during benchmarking. | | `task__plot_description` | Main description of the plot (*i.e.*, the task). | | `task__plot_style` | Description of the style of the plot. | | `_task__plot_description_short` | Synthetically shortened plot description (2-3 sentences). | | `_task__plot_description_short_single` | Synthetically shortened plot description (1 sentence). | | `plots_gt` | list of encoded (`base64.b64encode(image_file.read()).decode("utf-8")`) ground truth plots. Usually, this is a single plot. |

dataset_info: 数据集信息: 特征: - 名称:id,数据类型:int64 - 名称:code_plot,数据类型:字符串 - 名称:code_data,数据类型:字符串 - 名称:data_csv,数据类型:字符串 - 名称:task__plot_description,数据类型:字符串 - 名称:task__plot_style,数据类型:字符串 - 名称:plots_gt,数据类型:字符串序列 - 名称:_task__plot_description_short,数据类型:字符串 - 名称:_task__plot_description_short_single,数据类型:字符串 划分: - 名称:test,字节数:36342660,样本数量:175 下载大小:25693018,数据集总大小:36342660 配置项: - 配置名称:default,数据文件: - 划分:test,路径:data/test-* 许可证:Apache-2.0 --- --- --- # PandasPlotBench 基准数据集 PandasPlotBench是一款用于评估模型在给定Pandas数据框(Pandas DataFrame)描述的情况下,编写可视化代码能力的基准测试集。 🛠️ **任务**:给定绘图任务与Pandas数据框描述,编写生成可视化图表的代码。 本数据集基于[MatPlotLib 图库](https://matplotlib.org/stable/gallery/index.html)构建。 相关论文可在arXiv平台获取:https://arxiv.org/abs/2412.02764v1。 若需在该数据集上对模型进行评分,可使用[我们的GitHub仓库](https://github.com/JetBrains-Research/PandasPlotBench)。 📩 若您对本数据集有任何疑问或需求,请联系作者:[timur.galimzyanov@jetbrains.com](mailto:timur.galimzyanov@jetbrains.com)。 ## 使用指南 ### 数据加载 通过[`load_dataset`](https://huggingface.co/docs/datasets/v3.1.0/en/package_reference/loading_methods#datasets.load_dataset)加载数据: from datasets import load_dataset dataset = load_dataset("JetBrains-Research/plot_bench", split="test") 注:本数据集的全部数据均包含于`test`划分中。 ### 代码使用 基准测试代码可在[我们的GitHub仓库](https://github.com/JetBrains-Research/PandasPlotBench)中获取。 ### 评分方式 我们采用**大语言模型(Large Language Model,简称LLM)作为评判者**的方法,从两个维度对结果进行评分: - **可视化评分**:要求评判模型对两张图片进行0至100分的评分,重点关注图表的核心主题,忽略样式细节。 - **任务匹配评分**:要求评判模型对生成的图表与任务描述的契合度进行评分。 ## 数据点结构 每条数据样本包含以下字段: | 字段名 | 字段说明 | |----------------------------------------|---------------------------------------------------------------------------------------------------------------------------| | `id` | 数据样本的唯一标识符。 | | `code_plot` | 用于绘制`df`数据框的基准代码,本数据集的基础样本均使用`matplotlib`编写。 | | `code_data` | 数据加载代码。对于绝大多数样本,该代码为`import pandas as pd; df = pd.read_csv("data.csv")`。 | | `data_csv` | 待绘制的CSV格式数据。在基准测试过程中,其内容会被保存至数据集文件夹中。 | | `task__plot_description` | 图表的核心描述(即任务本身)。 | | `task__plot_style` | 图表的样式描述。 | | `_task__plot_description_short` | 人工合成的精简版图表描述(2-3个句子)。 | | `_task__plot_description_short_single` | 人工合成的单句精简版图表描述。 | | `plots_gt` | 经过Base64编码(采用`base64.b64encode(image_file.read()).decode("utf-8")`格式)的基准图表列表,通常仅包含单张图表。 |
提供机构:
JetBrains-Research
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作