CharXiv

Name: CharXiv
Creator: maas
Published: 2025-12-05 11:51:06
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-16 收录

下载链接：

https://modelscope.cn/datasets/princeton-nlp/CharXiv

下载链接

链接失效反馈

官方服务：

资源简介：

# CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs **NeurIPS 2024** 🏠[Home (🚧Still in construction)](https://charxiv.github.io/) | 🤗[Data](https://huggingface.co/datasets/princeton-nlp/CharXiv) | 🥇[Leaderboard](https://charxiv.github.io/#leaderboard) | 🖥️[Code](https://github.com/princeton-nlp/CharXiv) | 📄[Paper](https://arxiv.org/abs/2406.18521) This repo contains the full dataset for our paper **CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs**, which is a diverse and challenging chart understanding benchmark **fully curated by human experts**. It includes 2,323 high-resolution charts manually sourced from arXiv preprints. Each chart is paired with 4 descriptive questions (3 answerable and 1 unanswerable) and 1 reasoning question, all of which require open-vocabulary short answers that are easily verifiable. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/_9aZS02-ItKVtfpncKKZA.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/fuHiNm3hyhCo3YdCnt0WS.png) ## Results on Validation Set ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/8UrHszfGAv8D_7mFiDDkb.png) ## Raw Evaluation results You can access full evaluation results from existing models [here](https://huggingface.co/datasets/princeton-nlp/CharXiv/tree/main/existing_evaluations) ## Evaluating Your Multimodal Large Language Models This repo contains data where its schema follows dataset [standards](https://schema.org/). However, our evaluation pipeline has its own schema and thus you are most likely using [this](https://huggingface.co/datasets/princeton-nlp/CharXiv/blob/main/images.zip) file only (to get the image zipped file) if you are testing models using our official codebase. We are also planning to integrate CharXiv evaluations into [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) and [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) -- stay tuned! ## Erratum We want to be transparent about the dataset and thus we provide a list of errors in QAs discovered by the community. As we develop future versions of CharXiv when models get stronger, we'll fix these errors! * `0.jpg` contains wrong annotated reasoning answer to the question (discovered by linus106 in #2) ## Dataset Usage This dataset contains charts sourced from arXiv preprints, and it is intended to be used to evaluate models only. You are **NOT** allowed to use it to train your models. ## License All questions are licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en). The copyright of the charts belongs to the original authors. We provide each chart's source under the `original_id` column, which is the arXiv preprint number of papers with these charts. ## Contact Please submit an issue [here](https://github.com/princeton-nlp/CharXiv) or send me an email [here](mailto:zw1300@cs.princeton.edu?subject=%5BCharXiv%5D%20Inquery). ## Cite ``` @article{wang2024charxiv, title={CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs}, author={Wang, Zirui and Xia, Mengzhou and He, Luxi and Chen, Howard and Liu, Yitao and Zhu, Richard and Liang, Kaiqu and Wu, Xindi and Liu, Haotian and Malladi, Sadhika and Chevalier, Alexis and Arora, Sanjeev and Chen, Danqi}, journal={arXiv preprint arXiv:2406.18521}, year={2024} } ```

# CharXiv：多模态大语言模型（Multimodal LLM）真实图表理解能力的差距评测基准 **NeurIPS 2024 收录** 🏠[主页（🚧仍在建设中）](https://charxiv.github.io/) | 🤗[数据集](https://huggingface.co/datasets/princeton-nlp/CharXiv) | 🥇[排行榜](https://charxiv.github.io/#leaderboard) | 🖥️[代码](https://github.com/princeton-nlp/CharXiv) | 📄[论文](https://arxiv.org/abs/2406.18521) 本仓库包含我们发表于论文**CharXiv：多模态大语言模型真实图表理解能力的差距评测基准**的完整数据集。该基准是一个由人类专家全程精心甄选构建的多样化且极具挑战性的图表理解评测基准，包含2323张从arXiv预印本中手动采集的高分辨率图表。每张图表均配套有4道描述性问题（3道可作答，1道不可作答）与1道推理类问题，所有问题均要求输出开放词汇的简短答案，且答案易于验证。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/_9aZS02-ItKVtfpncKKZA.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/fuHiNm3hyhCo3YdCnt0WS.png) ## 验证集评测结果 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/8UrHszfGAv8D_7mFiDDkb.png) ## 原始评测结果您可通过[此链接](https://huggingface.co/datasets/princeton-nlp/CharXiv/tree/main/existing_evaluations)获取现有模型的完整评测结果。 ## 评测您的多模态大语言模型本仓库的数据集架构遵循[schema.org](https://schema.org/)标准。但我们的评测流水线拥有专属的架构规范，因此若您使用官方代码库测试模型，大概率仅需使用[该图表压缩包](https://huggingface.co/datasets/princeton-nlp/CharXiv/blob/main/images.zip)获取所需文件。我们还计划将CharXiv评测集成至[lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)与[VLMEvalKit](https://github.com/open-compass/VLMEvalKit)中，敬请期待！ ## 勘误说明为保持数据集的透明性，我们整理了社区发现的问答环节错误列表。随着模型性能不断提升，我们将在CharXiv后续版本中修复这些错误。 * `0.jpg` 存在标注的推理问题答案错误（由用户linus106在#2中发现） ## 数据集使用规范本数据集的图表均来自arXiv预印本，仅用于模型评测用途。您**不得**将其用于模型训练。 ## 授权协议所有问题均采用[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)协议授权。图表的版权归原作者所有。我们在`original_id`字段中提供了每张图表的来源，即包含该图表的arXiv预印本编号。 ## 联系方式请通过[此链接](https://github.com/princeton-nlp/CharXiv)提交Issue，或发送邮件至[zw1300@cs.princeton.edu](mailto:zw1300@cs.princeton.edu?subject=%5BCharXiv%5D%20Inquery)进行咨询。 ## 引用格式 @article{wang2024charxiv, title={CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs}, author={Wang, Zirui and Xia, Mengzhou and He, Luxi and Chen, Howard and Liu, Yitao and Zhu, Richard and Liang, Kaiqu and Wu, Xindi and Liu, Haotian and Malladi, Sadhika and Chevalier, Alexis and Arora, Sanjeev and Chen, Danqi}, journal={arXiv preprint arXiv:2406.18521}, year={2024} }

提供机构：

maas

创建时间：

2025-08-15

搜集汇总

数据集介绍