five

VisitBench

收藏
魔搭社区2025-12-03 更新2024-10-12 收录
下载链接:
https://modelscope.cn/datasets/lmms-lab/VisitBench
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for "VisitBench" <p align="center" width="100%"> <img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png" width="100%" height="80%"> </p> # Large-scale Multi-modality Models Evaluation Suite > Accelerating the development of large-scale multi-modality models (LMMs) with `lmms-eval` 🏠 [Homepage](https://lmms-lab.github.io/) | 📚 [Documentation](docs/README.md) | 🤗 [Huggingface Datasets](https://huggingface.co/lmms-lab) # This Dataset This is a formatted version of [VistBench](https://visit-bench.github.io/). It is used in our `lmms-eval` pipeline to allow for one-click evaluations of large multi-modality models. ``` @article{bitton2023visit, title={Visit-bench: A benchmark for vision-language instruction following inspired by real-world use}, author={Bitton, Yonatan and Bansal, Hritik and Hessel, Jack and Shao, Rulin and Zhu, Wanrong and Awadalla, Anas and Gardner, Josh and Taori, Rohan and Schimdt, Ludwig}, journal={arXiv preprint arXiv:2308.06595}, year={2023} } ``` Including visit_bench_single.csv and visit_bench_multi.csv, in total 1.2k items. Some of them are with `reference_output`, directly copied from [here](https://docs.google.com/spreadsheets/d/1hi8rGXf2WYufkFvGJ2MZ92JNChliM1QEJwZxNboUFlE/edit#gid=696111549). For each split, please follow the steps here to submit to VisitBench. ## Leaderboard The link to our public leaderboard is present [here](https://visit-bench.github.io/). ## How to add new models to the Leaderboard? 1. You can access the single-image and multiple-image datasets above. 2. For every instance (row) in the dataset csv, you would have your model's predictions. 3. Create a `predictions.csv` with 4 mandatory columns `instruction`, `instruction_category`, `image` (single-image case) / `images` (multi-image case), `<model name> prediction`. Here, `<model name>`should be your model name with version if multiple-versions are available. 4. Send a `prediction.csv` to us on `yonatanbitton1@gmail.com`. 5. We will use our internal prompting sandbox with reference-free GPT-4 as an evaluator. 6. We will add your model to the leaderboard once we receive all the pairwise judgments from the sandbox. 7. You will receive a confirmation email as soon as your model has been added to the leaderboard. 8. Estimated time from Step 4-7 would be 1-2 weeks, however, we will try to work on your prediction files as soon as they are sent. Please include in your email 1) a name for your model, 2) your team name (including your affiliation), and optionally, 3) a github repo or paper link. [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

# 「VisitBench」数据集卡片 <p align="center" width="100%"><img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png" width="100%" height="80%"></p> # 大规模多模态模型评测套件 > 依托`lmms-eval`加速大规模多模态模型(Large-scale Multi-modality Models, LMMs)的研发 🏠 [主页](https://lmms-lab.github.io/) | 📚 [文档](docs/README.md) | 🤗 [Huggingface数据集](https://huggingface.co/lmms-lab) # 本数据集 本数据集是[VisitBench](https://visit-bench.github.io/)的格式化版本,我们将其应用于`lmms-eval`流程中,以实现大规模多模态模型的一键评测。 @article{bitton2023visit, title={Visit-bench: A benchmark for vision-language instruction following inspired by real-world use}, author={Bitton, Yonatan and Bansal, Hritik and Hessel, Jack and Shao, Rulin and Zhu, Wanrong and Awadalla, Anas and Gardner, Josh and Taori, Rohan and Schimdt, Ludwig}, journal={arXiv preprint arXiv:2308.06595}, year={2023} } 数据集包含`visit_bench_single.csv`与`visit_bench_multi.csv`两个文件,共计1200条数据。 部分数据带有`reference_output`字段,其参考输出直接引自[此处](https://docs.google.com/spreadsheets/d/1hi8rGXf2WYufkFvGJ2MZ92JNChliM1QEJwZxNboUFlE/edit#gid=696111549)。 针对各数据划分,请遵循以下步骤提交至VisitBench评测。 ## 评测榜单 公开评测榜单的链接可点击[此处](https://visit-bench.github.io/)查看。 ## 如何向评测榜单新增模型? 1. 可直接获取上文提供的单图像与多图像数据集。 2. 针对数据集CSV文件中的每一条数据(即每一行),生成您所使用模型的预测结果。 3. 新建名为`predictions.csv`的文件,需包含4个必填列:`instruction`(指令文本)、`instruction_category`(指令类别)、`image`(单图像场景)/`images`(多图像场景)以及`<模型名称>_prediction`。若您的模型存在多个版本,`<模型名称>`需包含版本信息。 4. 将制作完成的`predictions.csv`文件发送至邮箱`yonatanbitton1@gmail.com`。 5. 我们将依托内置的提示沙箱,以无参考GPT-4作为评测器进行评分。 6. 待沙箱完成所有成对比较评分后,我们会将您的模型加入评测榜单。 7. 模型成功加入榜单后,您将立即收到确认邮件。 8. 从第4步到第7步的预计耗时为1至2周,我们也会在收到预测文件后尽快完成处理。 请在邮件中注明以下信息:1)模型名称;2)所属团队名称(包含所属机构);可选3)GitHub仓库地址或论文链接。 [更多信息请参阅](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
maas
创建时间:
2024-10-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作