VisitBench

Name: VisitBench
Creator: maas
Published: 2025-12-03 17:06:38
License: 暂无描述

魔搭社区2025-12-03 更新2024-10-12 收录

下载链接：

https://modelscope.cn/datasets/lmms-lab/VisitBench

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for "VisitBench" <p align="center" width="100%"> <img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png" width="100%" height="80%"> </p> # Large-scale Multi-modality Models Evaluation Suite > Accelerating the development of large-scale multi-modality models (LMMs) with `lmms-eval` 🏠 [Homepage](https://lmms-lab.github.io/) | 📚 [Documentation](docs/README.md) | 🤗 [Huggingface Datasets](https://huggingface.co/lmms-lab) # This Dataset This is a formatted version of [VistBench](https://visit-bench.github.io/). It is used in our `lmms-eval` pipeline to allow for one-click evaluations of large multi-modality models. ``` @article{bitton2023visit, title={Visit-bench: A benchmark for vision-language instruction following inspired by real-world use}, author={Bitton, Yonatan and Bansal, Hritik and Hessel, Jack and Shao, Rulin and Zhu, Wanrong and Awadalla, Anas and Gardner, Josh and Taori, Rohan and Schimdt, Ludwig}, journal={arXiv preprint arXiv:2308.06595}, year={2023} } ``` Including visit_bench_single.csv and visit_bench_multi.csv, in total 1.2k items. Some of them are with `reference_output`, directly copied from [here](https://docs.google.com/spreadsheets/d/1hi8rGXf2WYufkFvGJ2MZ92JNChliM1QEJwZxNboUFlE/edit#gid=696111549). For each split, please follow the steps here to submit to VisitBench. ## Leaderboard The link to our public leaderboard is present [here](https://visit-bench.github.io/). ## How to add new models to the Leaderboard? 1. You can access the single-image and multiple-image datasets above. 2. For every instance (row) in the dataset csv, you would have your model's predictions. 3. Create a `predictions.csv` with 4 mandatory columns `instruction`, `instruction_category`, `image` (single-image case) / `images` (multi-image case), `<model name> prediction`. Here, `<model name>`should be your model name with version if multiple-versions are available. 4. Send a `prediction.csv` to us on `yonatanbitton1@gmail.com`. 5. We will use our internal prompting sandbox with reference-free GPT-4 as an evaluator. 6. We will add your model to the leaderboard once we receive all the pairwise judgments from the sandbox. 7. You will receive a confirmation email as soon as your model has been added to the leaderboard. 8. Estimated time from Step 4-7 would be 1-2 weeks, however, we will try to work on your prediction files as soon as they are sent. Please include in your email 1) a name for your model, 2) your team name (including your affiliation), and optionally, 3) a github repo or paper link. [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

# 「VisitBench」数据集卡片 <p align="center" width="100%"><img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png" width="100%" height="80%"></p> # 大规模多模态模型评测套件 > 依托`lmms-eval`加速大规模多模态模型（Large-scale Multi-modality Models, LMMs）的研发 🏠 [主页](https://lmms-lab.github.io/) | 📚 [文档](docs/README.md) | 🤗 [Huggingface数据集](https://huggingface.co/lmms-lab) # 本数据集本数据集是[VisitBench](https://visit-bench.github.io/)的格式化版本，我们将其应用于`lmms-eval`流程中，以实现大规模多模态模型的一键评测。 @article{bitton2023visit, title={Visit-bench: A benchmark for vision-language instruction following inspired by real-world use}, author={Bitton, Yonatan and Bansal, Hritik and Hessel, Jack and Shao, Rulin and Zhu, Wanrong and Awadalla, Anas and Gardner, Josh and Taori, Rohan and Schimdt, Ludwig}, journal={arXiv preprint arXiv:2308.06595}, year={2023} } 数据集包含`visit_bench_single.csv`与`visit_bench_multi.csv`两个文件，共计1200条数据。部分数据带有`reference_output`字段，其参考输出直接引自[此处](https://docs.google.com/spreadsheets/d/1hi8rGXf2WYufkFvGJ2MZ92JNChliM1QEJwZxNboUFlE/edit#gid=696111549)。针对各数据划分，请遵循以下步骤提交至VisitBench评测。 ## 评测榜单公开评测榜单的链接可点击[此处](https://visit-bench.github.io/)查看。 ## 如何向评测榜单新增模型？ 1. 可直接获取上文提供的单图像与多图像数据集。 2. 针对数据集CSV文件中的每一条数据（即每一行），生成您所使用模型的预测结果。 3. 新建名为`predictions.csv`的文件，需包含4个必填列：`instruction`（指令文本）、`instruction_category`（指令类别）、`image`（单图像场景）/`images`（多图像场景）以及`<模型名称>_prediction`。若您的模型存在多个版本，`<模型名称>`需包含版本信息。 4. 将制作完成的`predictions.csv`文件发送至邮箱`yonatanbitton1@gmail.com`。 5. 我们将依托内置的提示沙箱，以无参考GPT-4作为评测器进行评分。 6. 待沙箱完成所有成对比较评分后，我们会将您的模型加入评测榜单。 7. 模型成功加入榜单后，您将立即收到确认邮件。 8. 从第4步到第7步的预计耗时为1至2周，我们也会在收到预测文件后尽快完成处理。请在邮件中注明以下信息：1）模型名称；2）所属团队名称（包含所属机构）；可选3）GitHub仓库地址或论文链接。 [更多信息请参阅](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

maas

创建时间：

2024-10-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集