five

khhuang/CHOCOLATE

收藏
Hugging Face2024-01-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/khhuang/CHOCOLATE
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated - found language_creators: - expert-generated - found language: - en license: apache-2.0 multilinguality: - monolingual size_categories: - 1K<n<10K paperswithcode_id: chocolate pretty_name: CHOCOLATE tags: - chart - plot - chart-to-text - vistext - statista - pew - chart-understanding - chart-captioning - chart-summarization - document-image configs: - config_name: default data_files: - split: test path: chocolate.json --- # Dataset Card for CHOCOLATE - [Dataset Description](https://huggingface.co/datasets/khhuang/CHOCOLATE/blob/main/README.md#dataset-description) - [Paper Information](https://huggingface.co/datasets/khhuang/CHOCOLATE/blob/main/README.md#paper-information) - [Citation](https://huggingface.co/datasets/khhuang/CHOCOLATE/blob/main/README.md#citation) ## Dataset Description **CHOCOLATE** is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six most advanced models, which are categorized into three subsets: - **LVLM**: GPT-4V, Bard (before Gemini) - **LLM-based Pipeline**: DePlot + GPT-4 - **Fine-tuned Model**: ChartT5, MatCha, UniChart The charts are from two datasets: VisText and the Pew split of Chart-to-Text. In total, **CHOCOLATE** consists of **1,187 examples**. Each instance in **CHOCOLATE** consists of a caption generated by one of the model and the annotations of the factual errors for each caption sentence. ## Paper Information - Paper: https://arxiv.org/abs/2312.10160 - Code: https://github.com/khuangaf/CHOCOLATE/ - Project: https://khuangaf.github.io/CHOCOLATE ## Citation If you use the **CHOCOLATE** dataset in your work, please kindly cite the paper using this BibTeX: ``` @misc{huang-etal-2023-do, title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning", author = "Huang, Kung-Hsiang and Zhou, Mingyang and Chan, Hou Pong and Fung, Yi R. and Wang, Zhenhailong and Zhang, Lingyu and Chang, Shih-Fu and Ji, Heng", year={2023}, eprint={2312.10160}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```
提供机构:
khhuang
原始信息汇总

数据集卡片 CHOCOLATE

数据集描述

CHOCOLATE 是一个用于检测和纠正生成图表标题中事实不一致性的基准数据集。它包含由六个最先进的模型生成的标题,分为三个子集:

  • LVLM: GPT-4V, Bard (before Gemini)
  • LLM-based Pipeline: DePlot + GPT-4
  • Fine-tuned Model: ChartT5, MatCha, UniChart

图表来自两个数据集:VisText 和 Chart-to-Text 的 Pew 分割。CHOCOLATE 总共包含 1,187 个示例CHOCOLATE 中的每个实例包含由其中一个模型生成的标题以及每个标题句子的事实错误注释。

论文信息

  • 论文: https://arxiv.org/abs/2312.10160
  • 代码: https://github.com/khuangaf/CHOCOLATE/
  • 项目: https://khuangaf.github.io/CHOCOLATE

引用

如果您在工作中使用了 CHOCOLATE 数据集,请使用以下 BibTeX 引用论文:

@misc{huang-etal-2023-do, title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning", author = "Huang, Kung-Hsiang and Zhou, Mingyang and Chan, Hou Pong and Fung, Yi R. and Wang, Zhenhailong and Zhang, Lingyu and Chang, Shih-Fu and Ji, Heng", year={2023}, eprint={2312.10160}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作