five

AIFEG/BenchLMM

收藏
Hugging Face2023-12-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/AIFEG/BenchLMM
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - visual-question-answering language: - en pretty_name: BenchLMM size_categories: - n<1K --- # Dataset Card for BenchLMM BenchLMM is a benchmarking dataset focusing on the cross-style visual capability of large multimodal models. It evaluates these models' performance in various visual contexts. ## Dataset Details ### Dataset Description - **Curated by:** Rizhao Cai, Zirui Song, Dayan Guan, Zhenhao Chen, Xing Luo, Chenyu Yi, and Alex Kot. - **Funded by :** Supported in part by the Rapid-Rich Object Search (ROSE) Lab of Nanyang Technological University and the NTU-PKU Joint Research Institute. - **Shared by :** AIFEG. - **Language(s) (NLP):** English. - **License:** Apache-2.0. ### Dataset Sources - **Repository:** [GitHub - AIFEG/BenchLMM](https://github.com/AIFEG/BenchLMM) - **Paper :** Cai, R., Song, Z., Guan, D., et al. (2023). BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models. arXiv:2312.02896. ## Uses ### Direct Use The dataset can be used to benchmark large multimodal models, especially focusing on their capability to interpret and respond to different visual styles. ## Dataset Structure - **Directory Structure:** - `baseline/`: Baseline code for LLaVA and InstructBLIP. - `evaluate/`: Python code for model evaluation. - `evaluate_results/`: Evaluation results of baseline models. - `jsonl/`: JSONL files with questions, image locations, and answers. ## Dataset Creation ### Curation Rationale Developed to assess large multimodal models' performance in diverse visual contexts, helping to understand their capabilities and limitations. ### Source Data #### Data Collection and Processing The dataset consists of various visual questions and corresponding answers, structured to evaluate multimodal model performance. ## Bias, Risks, and Limitations Users should consider the specific visual contexts and question types included in the dataset when interpreting model performance. ## Citation **BibTeX:** @misc{cai2023benchlmm, title={BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models}, author={Rizhao Cai and Zirui Song and Dayan Guan and Zhenhao Chen and Xing Luo and Chenyu Yi and Alex Kot}, year={2023}, eprint={2312.02896}, archivePrefix={arXiv}, primaryClass={cs.CV} } **APA:** Cai, R., Song, Z., Guan, D., Chen, Z., Luo, X., Yi, C., & Kot, A. (2023). BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models. arXiv preprint arXiv:2312.02896. ## Acknowledgements This research is supported in part by the Rapid-Rich Object Search (ROSE) Lab of Nanyang Technological University and the NTU-PKU Joint Research Institute.
提供机构:
AIFEG
原始信息汇总

数据集概述

名称: BenchLMM

目的: 专注于评估大型多模态模型在跨风格视觉能力方面的性能。

功能: 用于评估模型在多种视觉情境下的表现。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作