HarvardVCG/MisVisBench

Name: HarvardVCG/MisVisBench
Creator: HarvardVCG
Published: 2026-03-28 14:01:45
License: 暂无描述

Hugging Face2026-03-28 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/HarvardVCG/MisVisBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en size_categories: - 1K<n<10K viewer: false license: cc-by-nc-sa-4.0 paper: https://arxiv.org/abs/2603.22368 repository: https://github.com/Harsh-Lalai/Evaluating-Vision-Language-Models-on-Misleading-Data-Visualizations point_of_contact: lalaiharsh26@gmail.com --- ## Dataset Description - **Repository:** https://github.com/Harsh-Lalai/Evaluating-Vision-Language-Models-on-Misleading-Data-Visualizations - **Paper:** https://arxiv.org/abs/2603.22368 - **Point of Contact:** lalaiharsh26@gmail.com # Evaluating Vision-Language Models on Misleading Data Visualizations (Dataset) ## Overview This dataset accompanies the paper: “[When Visuals Aren’t the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations.](https://arxiv.org/abs/2603.22368)” MisVisBench is designed to evaluate whether Vision-Language Models (VLMs) can detect misleading information in **data visualization-caption pairs**, and whether they can correctly attribute the source of misleadingness to appropriate error types: Caption-level reasoning errors and Visualization design errors. Unlike prior benchmarks that primarily focus on chart understanding or visual distortions, MisVisBench enables **fine-grained analysis of misleadingness arising from both textual reasoning and visualization design choices**. --- # Dataset Structure ![2x2 misleadingness grid](mislead_grid_2by2.png) The dataset follows the **2 × 2 misleadingness decomposition** shown above. 2 × 2 mapping: - **△** → caption-level reasoning errors, visualization is not misleading - **○** → visualization design errors, caption is not misleading - **■** → both caption and visualization are misleading - **∅** → neither caption nor visualization is misleading (control) The exact top-level keys in `data.json` are: - `Misleading_Caption_Non_Misleading_Vis` - `Non_Misleading_Caption_Misleading_Vis` - `Misleading_Caption_Misleading_Vis` - `Non_Misleading_Caption_Non_Misleading_Vis` --- # Dataset Statistics | Subset | Count | |---|---:| | **△** | 793 | | **○** | 1110 | | **■** | 501 | | **∅** | 611 | | **Total** | 3015 | --- # Data Sources | Subset | Source | |---|---| | **△** | X/Twitter | | **○** | X/Twitter and subreddit DataIsUgly | | **■** | X | | **∅** | subreddit DataIsBeautiful | Notes: - For all samples sourced from **X**, we use the sample IDs from Lisnic et al. [1]. - In **○**, the first **601** samples are from **X** and the remaining samples are from **Reddit**. --- # Dataset File The dataset is provided as a **single JSON file**: ``` data.json ``` Structure: ```json { "data_type_name": { "sample_id": { "reasoning_error_names": [...], "visualization_error_names": [...], "text": "... (only present for Misleading_Caption_Misleading_Vis samples)" } } } ``` Example: ```json { "Misleading_Caption_Non_Misleading_Vis": { "example_id1": { "reasoning_error_names": ["Cherry-picking", "Causal inference"], "visualization_error_names": null } }, "Misleading_Caption_Misleading_Vis": { "example_id2": { "reasoning_error_names": ["Cherry-picking"], "visualization_error_names": ["Dual axis"], "text": "Example caption written by the authors that introduces reasoning errors." } } } ``` --- # Dataset Fields | Field | Description | |---|---| | **sample_id** | Identifier corresponding to the original post (tweet or Reddit post) | | **reasoning_error_names** | List of caption-level reasoning errors present in the example | | **visualization_error_names** | List of visualization design errors present in the chart | | **text** | Caption text (**only provided for ■ samples**) | ### Important Note on the `text` Field The **`text` field is only provided for ■ samples**. For these samples: - The captions were **written by the authors** - The goal is to introduce specific **reasoning errors** - The visualization is reused while the caption introduces the misleading reasoning For the other three subsets (**△**, **○**, and **∅**), the dataset **does not include the caption text**, and therefore the `text` field is **not present** in those entries. --- # Usage The dataset can be loaded using the Hugging Face `datasets` library. ```python from huggingface_hub import hf_hub_download import json # Download the raw JSON file from the dataset repo json_path = hf_hub_download( repo_id="MaybeMessi/MisVisBench", repo_type="dataset", filename="data.json" ) # Load the JSON with open(json_path, "r", encoding="utf-8") as f: data = json.load(f) # Iterate through the dataset for category_name, samples in data.items(): for sample_id, sample in samples.items(): reasoning_errors = sample["reasoning_error_names"] visualization_errors = sample["visualization_error_names"] print("Category:", category_name) print("Sample ID:", sample_id) print("Reasoning Errors:", reasoning_errors) print("Visualization Errors:", visualization_errors) print() ``` # Error Taxonomy ## Caption-Level Reasoning Errors - Cherry-picking - Causal inference - Setting an arbitrary threshold - Failure to account for statistical nuance - Incorrect reading of chart - Issues with data validity - Misrepresentation of scientific studies ## Visualization Design Errors - Truncated axis - Dual axis - Value encoded as area or volume - Inverted axis - Uneven binning - Unclear encoding - Inappropriate encoding ## Examples: Caption-Level Reasoning Errors <table style="width: 100%; table-layout: fixed; border-collapse: collapse;"> <colgroup> <col style="width: 60%;" /> <col style="width: 25%;" /> <col style="width: 15%;" /> </colgroup> <thead> <tr> <th width="60%" style="text-align: center;">Visualization</th> <th width="25%" style="text-align: left;">Caption</th> <th width="15%" style="text-align: center;">Reasoning Error</th> </tr> </thead> <tbody> <tr> <td style="text-align: center;"><img src="Examples/Cherry-picking.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">Reminder: Just because we've hit a peak does not mean we've hit THE peak.</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Cherry-picking</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Causal Inference.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">The positive impact of the UK's vaccination efforts in one graph</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Causal inference</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Setting Arb Threshold.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">This in a country of 56 million. Lift lockdown now, the virus is just gone.</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Setting an arbitrary threshold</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Stat Nuance.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">The numbers absolutely speak for themselves. Get vaccinated!</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Failure to account for statistical nuance</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Incorr Chart Reading.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">The flu is 10 times less deadly - particularly for elderly - than Covid!</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Incorrect reading of chart</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Data Val.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">This is a test of our humanity</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Issues with data validity</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Misrep Scientific Studies.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">SARS-Co∅2 positivity rates associated with circulating 25-hydroxyvitamin D levels (https://tinyurl.com/5n9xm536)</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Misrepresentation of scientific studies</strong></td> </tr> </tbody> </table> ## Examples: Visualization Design Errors <table style="width: 100%; table-layout: fixed; border-collapse: collapse;"> <colgroup> <col style="width: 60%;" /> <col style="width: 25%;" /> <col style="width: 15%;" /> </colgroup> <thead> <tr> <th width="60%" style="text-align: center;">Visualization</th> <th width="25%" style="text-align: left;">Caption</th> <th width="15%" style="text-align: center;">Visualization Error</th> </tr> </thead> <tbody> <tr> <td style="text-align: center;"><img src="Examples/Truncated Axis.png" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">Respiratory deaths at 10 year low!</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Truncated axis</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Dual Axis.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">May 17 Update: US COVID-19 Test Results: Test-and-Trace Success for Smallpox</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Dual axis</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Area Volume.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">Corona Virus Interactive Map.</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Value encoded as area or volume</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Inv Axis.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">Propaganda: RECORD NUMBER OF COVID POSITIVE CASES. Reality:</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Inverted axis</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Uneven Binning.jpeg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">Interesting colour coding from the BBC</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Uneven binning</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Unclear Encoding.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">The Navajo Nation crushed the Covid curve. Success is possible.</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Unclear encoding</strong></td> </tr> <tr> <td style="text-align: center;"><img src="Examples/Inappropriate Encoding.jpg" style="width: 360px; height: auto;"/></td> <td style="overflow-wrap: anywhere; text-align: left;">The worst pandemic of the most contagious disease we have seen for 100 years.</td> <td style="overflow-wrap: anywhere; text-align: center;"><strong>Inappropriate encoding</strong></td> </tr> </tbody> </table> --- # Dataset Purpose This dataset enables evaluation of whether models can: 1. Detect misleading chart-caption pairs 2. Determine whether misleadingness arises from the **caption, visualization, or both** 3. Attribute misleadingness to **specific error categories** This allows researchers to analyze how well VLMs handle **reasoning-based misinformation versus visualization design distortions**. --- # References [1] Lisnic, Maxim, Cole Polychronis, Alexander Lex, and Marina Kogan. "Misleading beyond visual tricks: How people actually lie with charts." In *Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems*, pp. 1-21. 2023. --- # License The dataset is released under the **CC-BY-NC-SA 4.0**. --- # Contact For any issues related to the dataset, feel free to reach out to lalaiharsh26@gmail.com --- # Citation ``` @article{lalai2026visuals, title={When Visuals Aren't the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations}, author={Lalai, Harsh Nishant and Shah, Raj Sanjay and Pfister, Hanspeter and Varma, Sashank and Guo, Grace}, journal={arXiv preprint arXiv:2603.22368}, year={2026} } ```

提供机构：

HarvardVCG

搜集汇总

数据集介绍

构建方式

在数据可视化与自然语言处理交叉领域，MisVisBench数据集的构建遵循严谨的学术框架。其核心基于一个2×2的误导性分解网格，系统性地从社交媒体平台X（原Twitter）和Reddit社区（如DataIsUgly和DataIsBeautiful）收集真实世界的数据可视化图表及其对应文本。构建过程中，研究者对每个样本进行了精细标注，区分了误导性源自文本推理错误、可视化设计错误，或两者兼有的情况，并为部分类别人工撰写了包含特定推理错误的文本，从而确保了数据在错误归因上的粒度与准确性。

特点

该数据集的一个显著特点是其结构化的错误分类体系。它不仅涵盖了常见的可视化设计缺陷，如截断坐标轴或双轴误导，还深入定义了七类文本层面的推理谬误，例如樱桃采摘或因果推断错误。这种双重分类机制使得数据集能够支持对多模态模型在识别复合性误导信息方面的能力进行细粒度评估。此外，数据集样本均来源于真实网络环境，增强了其生态效度，而控制组（无误导样本）的设置为基准性能评估提供了可靠参照。

使用方法

研究人员可通过Hugging Face平台便捷地加载此数据集。典型的使用流程包括利用`hf_hub_download`函数获取存储于`data.json`文件中的结构化数据。该JSON文件按四大误导类别组织，每个条目均包含样本ID、推理错误列表和可视化错误列表。用户可据此构建评估任务，例如训练或测试视觉-语言模型在给定图表-文本对中检测误导源并归类具体错误类型的能力，从而系统分析模型对基于推理的误导与基于视觉设计的误导的敏感度差异。

背景与挑战

背景概述

在数据可视化与自然语言处理交叉领域，MisVisBench数据集于2026年由Harsh Nishant Lalai等研究人员构建，旨在系统评估视觉语言模型在识别误导性数据可视化-标题对方面的能力。该数据集基于一篇题为《当视觉不是问题：评估视觉语言模型在误导性数据可视化上的表现》的学术论文，其核心研究问题聚焦于模型能否精准检测误导性信息，并准确归因误导源至文本推理错误或可视化设计错误。通过引入细粒度的2×2误导性分解框架，该数据集推动了视觉语言模型在信息可信度评估方面的研究进展，为理解模型在复杂多模态语境下的推理局限性提供了重要基准。

当前挑战

MisVisBench数据集所应对的领域挑战在于，现有视觉语言模型在理解数据可视化时，往往难以区分误导性是由文本推理谬误还是视觉设计缺陷所引发。这要求模型不仅需具备图表解析能力，还需融合逻辑推理以识别诸如“樱桃采摘”或“因果推断”等文本错误，以及“截断坐标轴”或“双轴误导”等视觉错误。在构建过程中，数据集面临标注一致性挑战，需确保3015个样本在四类误导性类别中的准确划分；同时，数据源自社交媒体平台如X和Reddit，其异构性与噪声为高质量样本筛选与错误类型标准化标注带来了显著难度。

常用场景

经典使用场景

在数据可视化与自然语言处理交叉领域，MisVisBench数据集为评估视觉语言模型在识别误导性信息方面的能力提供了基准。该数据集通过精心构建的图表-标题对，涵盖了由文本推理错误和可视化设计错误单独或共同导致的误导情形。研究者利用这一数据集，能够系统测试模型是否能够准确区分误导性的来源，并归因于具体的错误类型，从而推动模型在复杂多模态场景下的理解与推理性能的提升。

实际应用

在实际应用中，MisVisBench数据集能够服务于社交媒体内容审核、新闻事实核查以及科学传播质量评估等多个关键场景。通过训练或评估模型识别图表与配套文本中潜在的误导性表述，可以辅助自动化系统预警或过滤含有统计误导或视觉扭曲的信息。这对于遏制错误信息的传播、提升公众对数据解读的准确性具有重要的现实意义，尤其在公共卫生、经济数据等敏感领域。

衍生相关工作

围绕MisVisBench数据集，已衍生出一系列专注于多模态误导检测与归因的经典研究工作。这些工作不仅扩展了视觉语言模型的评测维度，还催生了新的模型架构与训练方法，旨在提升对复杂误导模式的细粒度理解。相关研究进一步推动了图表理解、多模态推理以及可信人工智能等子领域的发展，为后续构建更全面的多模态信息完整性评估体系提供了重要的参考与启发。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集