five

ckchaos/ChartDiff

收藏
Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ckchaos/ChartDiff
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset: ChartDiff license: cc-by-4.0 task_categories: - summarization - image-text-to-text - image-to-text - tabular-to-text pretty_name: ChartDiff configs: - config_name: default data_files: - split: train path: train/metadata.json - split: validation path: validation/metadata.json - split: test path: test/metadata.json --- # ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://ckchaos.github.io/ChartDiff) [![arXiv](https://img.shields.io/badge/arXiv-2603.28902-brightgreen)](https://arxiv.org/abs/2603.28902) ## Overview **ChartDiff** is a large-scale benchmark for **cross-chart comparative summarization**, designed to evaluate whether vision-language models can identify differences and generate coherent comparative descriptions across pairs of charts. Unlike existing chart understanding datasets that emphasize single-chart interpretation, ChartDiff requires models to compare **two charts jointly** and generate a **concise, structured summary of their differences**, including: - Overall trends - Local fluctuations - Notable anomalies ## Dataset Structure The dataset is organized into three splits: ``` ChartDiff/ ├── train/ ├── validation/ └── test/ ``` Each split contains: - `metadata.json`: data information - `{PAIR_ID}/`: a directory per pair, containing the associated chart images and their underlying CSV data ## Data Format Each entry in `metadata.json` follows: ```json { "id": "00000", "chart_A": "00000/00000_A.png", "chart_B": "00000/00000_B.png", "csv_A": "00000/00000_A.csv", "csv_B": "00000/00000_B.csv", "annotation": "......", "chart_type": "pie", "plotting_lib": "plotly" } ``` ### Field Description | Field | Description | | ------- | ------------------------------------- | | id | Unique identifier for each chart pair | | chart_A | Path to chart A image | | chart_B | Path to chart B image | | csv_A | Underlying data for chart A | | csv_B | Underlying data for chart B | | annotation | Reference comparison summary | | chart_type | Type of both chart A and chart B | | plotting_lib | Library for rendering chart A and chart B | ## Citation If you use ChartDiff, please cite: ```bibtex @misc{ye2026chartdiff, title={ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts}, author={Rongtian Ye}, year={2026}, eprint={2603.28902}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2603.28902}, } ```

数据集:ChartDiff 许可证:CC-BY-4.0 任务类别: - 摘要生成 - 图像-文本转文本 - 图像转文本 - 表格转文本 展示名称:ChartDiff 配置项: - 配置名称:default 数据文件: - 拆分:训练集,路径:train/metadata.json - 拆分:验证集,路径:validation/metadata.json - 拆分:测试集,路径:test/metadata.json # ChartDiff:面向图表对理解的大规模基准数据集 [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://ckchaos.github.io/ChartDiff) [![arXiv](https://img.shields.io/badge/arXiv-2603.28902-brightgreen)](https://arxiv.org/abs/2603.28902) ## 概述 **ChartDiff** 是一款面向**跨图表对比摘要(cross-chart comparative summarization)**的大规模基准数据集,旨在评估视觉语言模型(Vision-Language Model)能否识别图表间差异,并生成连贯的跨图表对比描述。 与现有侧重单图表解读的图表理解数据集不同,ChartDiff要求模型**联合比对两张图表**,并生成**简洁结构化的差异摘要**,涵盖: - 整体趋势 - 局部波动 - 显著异常点 ## 数据集结构 该数据集分为三个拆分子集: ChartDiff/ ├── train/ ├── validation/ └── test/ 每个拆分子集包含: - `metadata.json`:数据信息文件 - `{PAIR_ID}/`:每个图表对对应的目录,存储关联的图表图像及其底层CSV数据 ## 数据格式 `metadata.json` 中的每条条目格式如下: json { "id": "00000", "chart_A": "00000/00000_A.png", "chart_B": "00000/00000_B.png", "csv_A": "00000/00000_A.csv", "csv_B": "00000/00000_B.csv", "annotation": "......", "chart_type": "pie", "plotting_lib": "plotly" } ### 字段说明 | 字段名 | 描述内容 | | -------------- | ------------------------------------------------------------ | | id | 每个图表对的唯一标识符 | | chart_A | 图表A图像的文件路径 | | chart_B | 图表B图像的文件路径 | | csv_A | 图表A对应的底层数据文件 | | csv_B | 图表B对应的底层数据文件 | | annotation | 参考对比摘要 | | chart_type | 图表A与图表B的图表类型 | | plotting_lib | 渲染图表A与图表B所用的绘图库 | ## 引用 若使用ChartDiff数据集,请引用以下文献: bibtex @misc{ye2026chartdiff, title={ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts}, author={Rongtian Ye}, year={2026}, eprint={2603.28902}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2603.28902}, }
提供机构:
ckchaos
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作