ckchaos/ChartDiff
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ckchaos/ChartDiff
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset: ChartDiff
license: cc-by-4.0
task_categories:
- summarization
- image-text-to-text
- image-to-text
- tabular-to-text
pretty_name: ChartDiff
configs:
- config_name: default
data_files:
- split: train
path: train/metadata.json
- split: validation
path: validation/metadata.json
- split: test
path: test/metadata.json
---
# ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts
[](https://ckchaos.github.io/ChartDiff)
[](https://arxiv.org/abs/2603.28902)
## Overview
**ChartDiff** is a large-scale benchmark for **cross-chart comparative summarization**, designed to evaluate whether vision-language models can identify differences and generate coherent comparative descriptions across pairs of charts.
Unlike existing chart understanding datasets that emphasize single-chart interpretation, ChartDiff requires models to compare **two charts jointly** and generate a **concise, structured summary of their differences**, including:
- Overall trends
- Local fluctuations
- Notable anomalies
## Dataset Structure
The dataset is organized into three splits:
```
ChartDiff/
├── train/
├── validation/
└── test/
```
Each split contains:
- `metadata.json`: data information
- `{PAIR_ID}/`: a directory per pair, containing the associated chart images and their underlying CSV data
## Data Format
Each entry in `metadata.json` follows:
```json
{
"id": "00000",
"chart_A": "00000/00000_A.png",
"chart_B": "00000/00000_B.png",
"csv_A": "00000/00000_A.csv",
"csv_B": "00000/00000_B.csv",
"annotation": "......",
"chart_type": "pie",
"plotting_lib": "plotly"
}
```
### Field Description
| Field | Description |
| ------- | ------------------------------------- |
| id | Unique identifier for each chart pair |
| chart_A | Path to chart A image |
| chart_B | Path to chart B image |
| csv_A | Underlying data for chart A |
| csv_B | Underlying data for chart B |
| annotation | Reference comparison summary |
| chart_type | Type of both chart A and chart B |
| plotting_lib | Library for rendering chart A and chart B |
## Citation
If you use ChartDiff, please cite:
```bibtex
@misc{ye2026chartdiff,
title={ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts},
author={Rongtian Ye},
year={2026},
eprint={2603.28902},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.28902},
}
```
数据集:ChartDiff
许可证:CC-BY-4.0
任务类别:
- 摘要生成
- 图像-文本转文本
- 图像转文本
- 表格转文本
展示名称:ChartDiff
配置项:
- 配置名称:default
数据文件:
- 拆分:训练集,路径:train/metadata.json
- 拆分:验证集,路径:validation/metadata.json
- 拆分:测试集,路径:test/metadata.json
# ChartDiff:面向图表对理解的大规模基准数据集
[](https://ckchaos.github.io/ChartDiff)
[](https://arxiv.org/abs/2603.28902)
## 概述
**ChartDiff** 是一款面向**跨图表对比摘要(cross-chart comparative summarization)**的大规模基准数据集,旨在评估视觉语言模型(Vision-Language Model)能否识别图表间差异,并生成连贯的跨图表对比描述。
与现有侧重单图表解读的图表理解数据集不同,ChartDiff要求模型**联合比对两张图表**,并生成**简洁结构化的差异摘要**,涵盖:
- 整体趋势
- 局部波动
- 显著异常点
## 数据集结构
该数据集分为三个拆分子集:
ChartDiff/
├── train/
├── validation/
└── test/
每个拆分子集包含:
- `metadata.json`:数据信息文件
- `{PAIR_ID}/`:每个图表对对应的目录,存储关联的图表图像及其底层CSV数据
## 数据格式
`metadata.json` 中的每条条目格式如下:
json
{
"id": "00000",
"chart_A": "00000/00000_A.png",
"chart_B": "00000/00000_B.png",
"csv_A": "00000/00000_A.csv",
"csv_B": "00000/00000_B.csv",
"annotation": "......",
"chart_type": "pie",
"plotting_lib": "plotly"
}
### 字段说明
| 字段名 | 描述内容 |
| -------------- | ------------------------------------------------------------ |
| id | 每个图表对的唯一标识符 |
| chart_A | 图表A图像的文件路径 |
| chart_B | 图表B图像的文件路径 |
| csv_A | 图表A对应的底层数据文件 |
| csv_B | 图表B对应的底层数据文件 |
| annotation | 参考对比摘要 |
| chart_type | 图表A与图表B的图表类型 |
| plotting_lib | 渲染图表A与图表B所用的绘图库 |
## 引用
若使用ChartDiff数据集,请引用以下文献:
bibtex
@misc{ye2026chartdiff,
title={ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts},
author={Rongtian Ye},
year={2026},
eprint={2603.28902},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.28902},
}
提供机构:
ckchaos



