five

khhuang/chartve_dataset

收藏
Hugging Face2024-02-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/khhuang/chartve_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 multilinguality: - monolingual size_categories: - 100K<n<1M tags: - chart - plot - chart-to-text - vistext - statista - pew - chart-visual-entailment - chart-understanding - chart-captioning - chart-summarization - document-image configs: - config_name: default data_files: - split: train path: data/train-* - split: dev path: data/dev-* dataset_info: features: - name: image dtype: string - name: sentence dtype: string - name: label dtype: string - name: manipulation_type dtype: string - name: dataset dtype: string splits: - name: train num_bytes: 118229163.0 num_examples: 522531 - name: dev num_bytes: 9400046.0 num_examples: 36002 download_size: 51634467 dataset_size: 127629209.0 --- # Dataset Card for ChartVE's Training Data - [Dataset Description](https://huggingface.co/datasets/khhuang/ChartVE/blob/main/README.md#dataset-description) - [Paper Information](https://huggingface.co/datasets/khhuang/ChartVE/blob/main/README.md#paper-information) - [Citation](https://huggingface.co/datasets/khhuang/ChartVE/blob/main/README.md#citation) ## Dataset Description [ChartVE](https://huggingface.co/khhuang/chartve) (Chart Visual Entailment) is a visual entailment model introduced in the paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning" for evaluating the factuality of a generated caption sentence with regard to the input chart. The model takes in a chart figure and a caption sentence as input, and outputs an entailment probability. This repository hosts the training and validation data for ChartVE. ### Fields Below, we illustrate the fields in each instance. - `image`: The path to chart image. Images can be found in [image.zip](https://huggingface.co/datasets/khhuang/chartve_dataset/blob/main/images.zip). - `sentence`: The sentence used as the _hypothesis_. - `label`: An indicator about whether the chart entails the given `sentence`. - `manipulation_type`: The type of perturbation that alters the original sentence (this is only applicable for non-entailment instances). - `dataset`: The source dataset of the chart `image`. ## Paper Information - Paper: https://arxiv.org/abs/2312.10160 - Code: https://github.com/khuangaf/CHOCOLATE/ - Project: https://khuangaf.github.io/CHOCOLATE ## Citation If you use the **ChartVE** dataset/model in your work, please kindly cite the paper using this BibTeX: ``` @misc{huang-etal-2023-do, title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning", author = "Huang, Kung-Hsiang and Zhou, Mingyang and Chan, Hou Pong and Fung, Yi R. and Wang, Zhenhailong and Zhang, Lingyu and Chang, Shih-Fu and Ji, Heng", year={2023}, eprint={2312.10160}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```
提供机构:
khhuang
原始信息汇总

数据集卡片 for ChartVE 的训练数据

数据集描述

ChartVE (Chart Visual Entailment) 是一个视觉蕴涵模型,用于评估生成的图表描述句子与输入图表的事实一致性。该模型接受图表图像和描述句子作为输入,并输出蕴涵概率。本仓库托管了 ChartVE 的训练和验证数据。

字段

以下是每个实例中的字段:

  • image: 图表图像的路径。图像可以在 image.zip 中找到。
  • sentence: 作为假设的句子。
  • label: 指示图表是否蕴涵给定的 sentence
  • manipulation_type: 改变原始句子的扰动类型(仅适用于非蕴涵实例)。
  • dataset: 图表 image 的来源数据集。

数据集信息

  • 语言: 英语
  • 许可证: Apache 2.0
  • 多语言性: 单语种
  • 大小类别: 100K<n<1M
  • 标签:
    • chart
    • plot
    • chart-to-text
    • vistext
    • statista
    • pew
    • chart-visual-entailment
    • chart-understanding
    • chart-captioning
    • chart-summarization
    • document-image

配置

  • 配置名称: default
    • 数据文件:
      • 分割: train
        • 路径: data/train-*
      • 分割: dev
        • 路径: data/dev-*

数据集特征

  • 特征:
    • 名称: image
      • 数据类型: string
    • 名称: sentence
      • 数据类型: string
    • 名称: label
      • 数据类型: string
    • 名称: manipulation_type
      • 数据类型: string
    • 名称: dataset
      • 数据类型: string

分割

  • 名称: train
    • 字节数: 118229163.0
    • 示例数: 522531
  • 名称: dev
    • 字节数: 9400046.0
    • 示例数: 36002

下载大小

  • 下载大小: 51634467

数据集大小

  • 数据集大小: 127629209.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作