khhuang/chartve_dataset

Name: khhuang/chartve_dataset
Creator: khhuang
Published: 2024-02-18 00:10:46
License: 暂无描述

Hugging Face2024-02-18 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/khhuang/chartve_dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 multilinguality: - monolingual size_categories: - 100K<n<1M tags: - chart - plot - chart-to-text - vistext - statista - pew - chart-visual-entailment - chart-understanding - chart-captioning - chart-summarization - document-image configs: - config_name: default data_files: - split: train path: data/train-* - split: dev path: data/dev-* dataset_info: features: - name: image dtype: string - name: sentence dtype: string - name: label dtype: string - name: manipulation_type dtype: string - name: dataset dtype: string splits: - name: train num_bytes: 118229163.0 num_examples: 522531 - name: dev num_bytes: 9400046.0 num_examples: 36002 download_size: 51634467 dataset_size: 127629209.0 --- # Dataset Card for ChartVE's Training Data - [Dataset Description](https://huggingface.co/datasets/khhuang/ChartVE/blob/main/README.md#dataset-description) - [Paper Information](https://huggingface.co/datasets/khhuang/ChartVE/blob/main/README.md#paper-information) - [Citation](https://huggingface.co/datasets/khhuang/ChartVE/blob/main/README.md#citation) ## Dataset Description [ChartVE](https://huggingface.co/khhuang/chartve) (Chart Visual Entailment) is a visual entailment model introduced in the paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning" for evaluating the factuality of a generated caption sentence with regard to the input chart. The model takes in a chart figure and a caption sentence as input, and outputs an entailment probability. This repository hosts the training and validation data for ChartVE. ### Fields Below, we illustrate the fields in each instance. - `image`: The path to chart image. Images can be found in [image.zip](https://huggingface.co/datasets/khhuang/chartve_dataset/blob/main/images.zip). - `sentence`: The sentence used as the _hypothesis_. - `label`: An indicator about whether the chart entails the given `sentence`. - `manipulation_type`: The type of perturbation that alters the original sentence (this is only applicable for non-entailment instances). - `dataset`: The source dataset of the chart `image`. ## Paper Information - Paper: https://arxiv.org/abs/2312.10160 - Code: https://github.com/khuangaf/CHOCOLATE/ - Project: https://khuangaf.github.io/CHOCOLATE ## Citation If you use the **ChartVE** dataset/model in your work, please kindly cite the paper using this BibTeX: ``` @misc{huang-etal-2023-do, title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning", author = "Huang, Kung-Hsiang and Zhou, Mingyang and Chan, Hou Pong and Fung, Yi R. and Wang, Zhenhailong and Zhang, Lingyu and Chang, Shih-Fu and Ji, Heng", year={2023}, eprint={2312.10160}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

提供机构：

khhuang

原始信息汇总

数据集卡片 for ChartVE 的训练数据

数据集描述

ChartVE (Chart Visual Entailment) 是一个视觉蕴涵模型，用于评估生成的图表描述句子与输入图表的事实一致性。该模型接受图表图像和描述句子作为输入，并输出蕴涵概率。本仓库托管了 ChartVE 的训练和验证数据。

字段

以下是每个实例中的字段：

image: 图表图像的路径。图像可以在 image.zip 中找到。
sentence: 作为假设的句子。
label: 指示图表是否蕴涵给定的 sentence。
manipulation_type: 改变原始句子的扰动类型（仅适用于非蕴涵实例）。
dataset: 图表 image 的来源数据集。

数据集信息

语言: 英语
许可证: Apache 2.0
多语言性: 单语种
大小类别: 100K<n<1M
标签:
- chart
- plot
- chart-to-text
- vistext
- statista
- pew
- chart-visual-entailment
- chart-understanding
- chart-captioning
- chart-summarization
- document-image

配置

配置名称: default
- 数据文件:
  - 分割: train
    - 路径: data/train-*
  - 分割: dev
    - 路径: data/dev-*

数据集特征

特征:
- 名称: image
  - 数据类型: string
- 名称: sentence
  - 数据类型: string
- 名称: label
  - 数据类型: string
- 名称: manipulation_type
  - 数据类型: string
- 名称: dataset
  - 数据类型: string

分割

名称: train
- 字节数: 118229163.0
- 示例数: 522531
名称: dev
- 字节数: 9400046.0
- 示例数: 36002

下载大小

下载大小: 51634467

数据集大小

数据集大小: 127629209.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集