VisitBench
收藏魔搭社区2025-12-03 更新2024-10-12 收录
下载链接:
https://modelscope.cn/datasets/lmms-lab/VisitBench
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "VisitBench"
<p align="center" width="100%">
<img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png" width="100%" height="80%">
</p>
# Large-scale Multi-modality Models Evaluation Suite
> Accelerating the development of large-scale multi-modality models (LMMs) with `lmms-eval`
🏠 [Homepage](https://lmms-lab.github.io/) | 📚 [Documentation](docs/README.md) | 🤗 [Huggingface Datasets](https://huggingface.co/lmms-lab)
# This Dataset
This is a formatted version of [VistBench](https://visit-bench.github.io/). It is used in our `lmms-eval` pipeline to allow for one-click evaluations of large multi-modality models.
```
@article{bitton2023visit,
title={Visit-bench: A benchmark for vision-language instruction following inspired by real-world use},
author={Bitton, Yonatan and Bansal, Hritik and Hessel, Jack and Shao, Rulin and Zhu, Wanrong and Awadalla, Anas and Gardner, Josh and Taori, Rohan and Schimdt, Ludwig},
journal={arXiv preprint arXiv:2308.06595},
year={2023}
}
```
Including visit_bench_single.csv and visit_bench_multi.csv, in total 1.2k items.
Some of them are with `reference_output`, directly copied from [here](https://docs.google.com/spreadsheets/d/1hi8rGXf2WYufkFvGJ2MZ92JNChliM1QEJwZxNboUFlE/edit#gid=696111549).
For each split, please follow the steps here to submit to VisitBench.
## Leaderboard
The link to our public leaderboard is present [here](https://visit-bench.github.io/).
## How to add new models to the Leaderboard?
1. You can access the single-image and multiple-image datasets above.
2. For every instance (row) in the dataset csv, you would have your model's predictions.
3. Create a `predictions.csv` with 4 mandatory columns `instruction`, `instruction_category`, `image` (single-image case) / `images` (multi-image case), `<model name> prediction`. Here, `<model name>`should be your model name with version if multiple-versions are available.
4. Send a `prediction.csv` to us on `yonatanbitton1@gmail.com`.
5. We will use our internal prompting sandbox with reference-free GPT-4 as an evaluator.
6. We will add your model to the leaderboard once we receive all the pairwise judgments from the sandbox.
7. You will receive a confirmation email as soon as your model has been added to the leaderboard.
8. Estimated time from Step 4-7 would be 1-2 weeks, however, we will try to work on your prediction files as soon as they are sent.
Please include in your email 1) a name for your model, 2) your team name (including your affiliation), and optionally, 3) a github repo or paper link.
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
# 「VisitBench」数据集卡片
<p align="center" width="100%"><img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png" width="100%" height="80%"></p>
# 大规模多模态模型评测套件
> 依托`lmms-eval`加速大规模多模态模型(Large-scale Multi-modality Models, LMMs)的研发
🏠 [主页](https://lmms-lab.github.io/) | 📚 [文档](docs/README.md) | 🤗 [Huggingface数据集](https://huggingface.co/lmms-lab)
# 本数据集
本数据集是[VisitBench](https://visit-bench.github.io/)的格式化版本,我们将其应用于`lmms-eval`流程中,以实现大规模多模态模型的一键评测。
@article{bitton2023visit,
title={Visit-bench: A benchmark for vision-language instruction following inspired by real-world use},
author={Bitton, Yonatan and Bansal, Hritik and Hessel, Jack and Shao, Rulin and Zhu, Wanrong and Awadalla, Anas and Gardner, Josh and Taori, Rohan and Schimdt, Ludwig},
journal={arXiv preprint arXiv:2308.06595},
year={2023}
}
数据集包含`visit_bench_single.csv`与`visit_bench_multi.csv`两个文件,共计1200条数据。
部分数据带有`reference_output`字段,其参考输出直接引自[此处](https://docs.google.com/spreadsheets/d/1hi8rGXf2WYufkFvGJ2MZ92JNChliM1QEJwZxNboUFlE/edit#gid=696111549)。
针对各数据划分,请遵循以下步骤提交至VisitBench评测。
## 评测榜单
公开评测榜单的链接可点击[此处](https://visit-bench.github.io/)查看。
## 如何向评测榜单新增模型?
1. 可直接获取上文提供的单图像与多图像数据集。
2. 针对数据集CSV文件中的每一条数据(即每一行),生成您所使用模型的预测结果。
3. 新建名为`predictions.csv`的文件,需包含4个必填列:`instruction`(指令文本)、`instruction_category`(指令类别)、`image`(单图像场景)/`images`(多图像场景)以及`<模型名称>_prediction`。若您的模型存在多个版本,`<模型名称>`需包含版本信息。
4. 将制作完成的`predictions.csv`文件发送至邮箱`yonatanbitton1@gmail.com`。
5. 我们将依托内置的提示沙箱,以无参考GPT-4作为评测器进行评分。
6. 待沙箱完成所有成对比较评分后,我们会将您的模型加入评测榜单。
7. 模型成功加入榜单后,您将立即收到确认邮件。
8. 从第4步到第7步的预计耗时为1至2周,我们也会在收到预测文件后尽快完成处理。
请在邮件中注明以下信息:1)模型名称;2)所属团队名称(包含所属机构);可选3)GitHub仓库地址或论文链接。
[更多信息请参阅](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
maas
创建时间:
2024-10-06



