five

CharXiv

收藏
魔搭社区2025-12-05 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/princeton-nlp/CharXiv
下载链接
链接失效反馈
官方服务:
资源简介:
# CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs **NeurIPS 2024** 🏠[Home (🚧Still in construction)](https://charxiv.github.io/) | 🤗[Data](https://huggingface.co/datasets/princeton-nlp/CharXiv) | 🥇[Leaderboard](https://charxiv.github.io/#leaderboard) | 🖥️[Code](https://github.com/princeton-nlp/CharXiv) | 📄[Paper](https://arxiv.org/abs/2406.18521) This repo contains the full dataset for our paper **CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs**, which is a diverse and challenging chart understanding benchmark **fully curated by human experts**. It includes 2,323 high-resolution charts manually sourced from arXiv preprints. Each chart is paired with 4 descriptive questions (3 answerable and 1 unanswerable) and 1 reasoning question, all of which require open-vocabulary short answers that are easily verifiable. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/_9aZS02-ItKVtfpncKKZA.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/fuHiNm3hyhCo3YdCnt0WS.png) ## Results on Validation Set ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/8UrHszfGAv8D_7mFiDDkb.png) ## Raw Evaluation results You can access full evaluation results from existing models [here](https://huggingface.co/datasets/princeton-nlp/CharXiv/tree/main/existing_evaluations) ## Evaluating Your Multimodal Large Language Models This repo contains data where its schema follows dataset [standards](https://schema.org/). However, our evaluation pipeline has its own schema and thus you are most likely using [this](https://huggingface.co/datasets/princeton-nlp/CharXiv/blob/main/images.zip) file only (to get the image zipped file) if you are testing models using our official codebase. We are also planning to integrate CharXiv evaluations into [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) and [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) -- stay tuned! ## Erratum We want to be transparent about the dataset and thus we provide a list of errors in QAs discovered by the community. As we develop future versions of CharXiv when models get stronger, we'll fix these errors! * `0.jpg` contains wrong annotated reasoning answer to the question (discovered by linus106 in #2) ## Dataset Usage This dataset contains charts sourced from arXiv preprints, and it is intended to be used to evaluate models only. You are **NOT** allowed to use it to train your models. ## License All questions are licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en). The copyright of the charts belongs to the original authors. We provide each chart's source under the `original_id` column, which is the arXiv preprint number of papers with these charts. ## Contact Please submit an issue [here](https://github.com/princeton-nlp/CharXiv) or send me an email [here](mailto:zw1300@cs.princeton.edu?subject=%5BCharXiv%5D%20Inquery). ## Cite ``` @article{wang2024charxiv, title={CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs}, author={Wang, Zirui and Xia, Mengzhou and He, Luxi and Chen, Howard and Liu, Yitao and Zhu, Richard and Liang, Kaiqu and Wu, Xindi and Liu, Haotian and Malladi, Sadhika and Chevalier, Alexis and Arora, Sanjeev and Chen, Danqi}, journal={arXiv preprint arXiv:2406.18521}, year={2024} } ```

# CharXiv:多模态大语言模型(Multimodal LLM)真实图表理解能力的差距评测基准 **NeurIPS 2024 收录** 🏠[主页(🚧仍在建设中)](https://charxiv.github.io/) | 🤗[数据集](https://huggingface.co/datasets/princeton-nlp/CharXiv) | 🥇[排行榜](https://charxiv.github.io/#leaderboard) | 🖥️[代码](https://github.com/princeton-nlp/CharXiv) | 📄[论文](https://arxiv.org/abs/2406.18521) 本仓库包含我们发表于论文**CharXiv:多模态大语言模型真实图表理解能力的差距评测基准**的完整数据集。该基准是一个由人类专家全程精心甄选构建的多样化且极具挑战性的图表理解评测基准,包含2323张从arXiv预印本中手动采集的高分辨率图表。每张图表均配套有4道描述性问题(3道可作答,1道不可作答)与1道推理类问题,所有问题均要求输出开放词汇的简短答案,且答案易于验证。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/_9aZS02-ItKVtfpncKKZA.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/fuHiNm3hyhCo3YdCnt0WS.png) ## 验证集评测结果 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/8UrHszfGAv8D_7mFiDDkb.png) ## 原始评测结果 您可通过[此链接](https://huggingface.co/datasets/princeton-nlp/CharXiv/tree/main/existing_evaluations)获取现有模型的完整评测结果。 ## 评测您的多模态大语言模型 本仓库的数据集架构遵循[schema.org](https://schema.org/)标准。但我们的评测流水线拥有专属的架构规范,因此若您使用官方代码库测试模型,大概率仅需使用[该图表压缩包](https://huggingface.co/datasets/princeton-nlp/CharXiv/blob/main/images.zip)获取所需文件。我们还计划将CharXiv评测集成至[lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)与[VLMEvalKit](https://github.com/open-compass/VLMEvalKit)中,敬请期待! ## 勘误说明 为保持数据集的透明性,我们整理了社区发现的问答环节错误列表。随着模型性能不断提升,我们将在CharXiv后续版本中修复这些错误。 * `0.jpg` 存在标注的推理问题答案错误(由用户linus106在#2中发现) ## 数据集使用规范 本数据集的图表均来自arXiv预印本,仅用于模型评测用途。您**不得**将其用于模型训练。 ## 授权协议 所有问题均采用[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)协议授权。图表的版权归原作者所有。我们在`original_id`字段中提供了每张图表的来源,即包含该图表的arXiv预印本编号。 ## 联系方式 请通过[此链接](https://github.com/princeton-nlp/CharXiv)提交Issue,或发送邮件至[zw1300@cs.princeton.edu](mailto:zw1300@cs.princeton.edu?subject=%5BCharXiv%5D%20Inquery)进行咨询。 ## 引用格式 @article{wang2024charxiv, title={CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs}, author={Wang, Zirui and Xia, Mengzhou and He, Luxi and Chen, Howard and Liu, Yitao and Zhu, Richard and Liang, Kaiqu and Wu, Xindi and Liu, Haotian and Malladi, Sadhika and Chevalier, Alexis and Arora, Sanjeev and Chen, Danqi}, journal={arXiv preprint arXiv:2406.18521}, year={2024} }
提供机构:
maas
创建时间:
2025-08-15
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
CharXiv是一个高质量的多模态图表理解基准数据集,包含2,323张来自arXiv的图表和配套问题,专为评估模型设计,禁止用于训练。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作