khhuang/CHOCOLATE
收藏Hugging Face2024-01-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/khhuang/CHOCOLATE
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
- found
language_creators:
- expert-generated
- found
language:
- en
license: apache-2.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
paperswithcode_id: chocolate
pretty_name: CHOCOLATE
tags:
- chart
- plot
- chart-to-text
- vistext
- statista
- pew
- chart-understanding
- chart-captioning
- chart-summarization
- document-image
configs:
- config_name: default
data_files:
- split: test
path: chocolate.json
---
# Dataset Card for CHOCOLATE
- [Dataset Description](https://huggingface.co/datasets/khhuang/CHOCOLATE/blob/main/README.md#dataset-description)
- [Paper Information](https://huggingface.co/datasets/khhuang/CHOCOLATE/blob/main/README.md#paper-information)
- [Citation](https://huggingface.co/datasets/khhuang/CHOCOLATE/blob/main/README.md#citation)
## Dataset Description
**CHOCOLATE** is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six most advanced models, which are categorized into three subsets:
- **LVLM**: GPT-4V, Bard (before Gemini)
- **LLM-based Pipeline**: DePlot + GPT-4
- **Fine-tuned Model**: ChartT5, MatCha, UniChart
The charts are from two datasets: VisText and the Pew split of Chart-to-Text. In total, **CHOCOLATE** consists of **1,187 examples**. Each instance in **CHOCOLATE** consists of a caption generated by one of the model and the annotations of the factual errors for each caption sentence.
## Paper Information
- Paper: https://arxiv.org/abs/2312.10160
- Code: https://github.com/khuangaf/CHOCOLATE/
- Project: https://khuangaf.github.io/CHOCOLATE
## Citation
If you use the **CHOCOLATE** dataset in your work, please kindly cite the paper using this BibTeX:
```
@misc{huang-etal-2023-do,
title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning",
author = "Huang, Kung-Hsiang and
Zhou, Mingyang and
Chan, Hou Pong and
Fung, Yi R. and
Wang, Zhenhailong and
Zhang, Lingyu and
Chang, Shih-Fu and
Ji, Heng",
year={2023},
eprint={2312.10160},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
提供机构:
khhuang
原始信息汇总
数据集卡片 CHOCOLATE
数据集描述
CHOCOLATE 是一个用于检测和纠正生成图表标题中事实不一致性的基准数据集。它包含由六个最先进的模型生成的标题,分为三个子集:
- LVLM: GPT-4V, Bard (before Gemini)
- LLM-based Pipeline: DePlot + GPT-4
- Fine-tuned Model: ChartT5, MatCha, UniChart
图表来自两个数据集:VisText 和 Chart-to-Text 的 Pew 分割。CHOCOLATE 总共包含 1,187 个示例。CHOCOLATE 中的每个实例包含由其中一个模型生成的标题以及每个标题句子的事实错误注释。
论文信息
- 论文: https://arxiv.org/abs/2312.10160
- 代码: https://github.com/khuangaf/CHOCOLATE/
- 项目: https://khuangaf.github.io/CHOCOLATE
引用
如果您在工作中使用了 CHOCOLATE 数据集,请使用以下 BibTeX 引用论文:
@misc{huang-etal-2023-do, title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning", author = "Huang, Kung-Hsiang and Zhou, Mingyang and Chan, Hou Pong and Fung, Yi R. and Wang, Zhenhailong and Zhang, Lingyu and Chang, Shih-Fu and Ji, Heng", year={2023}, eprint={2312.10160}, archivePrefix={arXiv}, primaryClass={cs.CL} }



