AlignMMBench

Name: AlignMMBench
Creator: maas
Published: 2026-01-02 16:16:34
License: 暂无描述

魔搭社区2026-01-02 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/ZhipuAI/AlignMMBench

下载链接

链接失效反馈

官方服务：

资源简介：

# AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models <font size=4><div align='center' > [[🍎 Project Page](https://alignmmbench.github.io/)] [[📖 arXiv Paper](https://arxiv.org/pdf/2406.09295)] [[📊 Dataset](https://huggingface.co/datasets/THUDM/AlignMMBench)] </div></font> <p align="center"> <img src="./assets/index.png" width="96%" height="50%"> </p> --- ## 🔥 News * **`2024.06.14`** 🌟 We released AlignMMBench, a comprehensive alignment benchmark for vision language models! ## 👀 Introduce to AlignMMBench AlignMMBench is a multimodal alignment benchmark that encompasses both single-turn and multi-turn dialogue scenarios. It includes three categories and thirteen capability tasks, with a total of 4,978 question-answer pairs. ### Features 1. **High-Quality Annotations**: Reliable benchmark with meticulous human annotation and multi-stage quality control processes. 2. **Self Critic**: To improve the controllability of alignment evaluation, we introduce the CritiqueVLM, a ChatGLM3-6B based evaluator that has been rule-calibrated and carefully finetuned. With human judgements, its evaluation consistency surpasses that of GPT-4. 3. **Diverse Data**: Three categories and thirteen capability tasks, including both single-turn and multi-turn dialogue scenarios. <img src="./assets/image_examples.png" width="100%" height="50%"> ## 📈 Results <p align="center"> <img src="./assets/leaderboard.png" width="96%" height="50%"> </p> ## License The use of the dataset and the original videos is governed by the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license, as detailed in the [LICENSE](./LICENSE). If you believe that any content in this dataset infringes on your rights, please contact us at **wenmeng.yu@aminer.cn** to request its removal. ## Citation If you find our work helpful for your research, please consider citing our work. ```bibtex @misc{wu2024alignmmbench, title={AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models}, author={Yuhang Wu and Wenmeng Yu and Yean Cheng and Yan Wang and Xiaohan Zhang and Jiazheng Xu and Ming Ding and Yuxiao Dong}, year={2024}, eprint={2406.09295}, archivePrefix={arXiv} } ```

# AlignMMBench：面向大视觉语言模型的中文多模态对齐评测 <font size=4><div align='center' > [[🍎 项目主页](https://alignmmbench.github.io/)] [[📖 arXiv论文](https://arxiv.org/pdf/2406.09295)] [[📊 数据集](https://huggingface.co/datasets/THUDM/AlignMMBench)] </div></font> <p align="center"> <img src="./assets/index.png" width="96%" height="50%"> </p> --- ## 🔥 最新动态 * **`2024.06.14`** 🌟 我们正式发布AlignMMBench——一款面向视觉语言模型的综合性对齐评测基准！ ## 👀 AlignMMBench简介 AlignMMBench是一款涵盖单轮与多轮对话场景的多模态对齐评测基准，共包含3大类、13项能力任务，总计4978组问答对。 ### 核心特性 1. **高质量标注**：本基准经过严谨的人工标注与多阶段质量管控流程，具备极高的评测可靠性。 2. **自评校验机制**：为提升对齐评测的可控性，我们引入了基于ChatGLM3-6B的评测器CritiqueVLM，该模型经过规则校准与精细化微调。结合人工评判结果，其评测一致性超越GPT-4。 3. **数据多样性**：评测基准覆盖3大类共13项能力任务，同时包含单轮与多轮对话场景。 <img src="./assets/image_examples.png" width="100%" height="50%"> ## 📈 评测结果 <p align="center"> <img src="./assets/leaderboard.png" width="96%" height="50%"> </p> ## 许可协议本数据集与原始视频的使用需遵循知识共享署名-非商业性使用-相同方式共享4.0国际许可协议（Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International，CC BY-NC-SA 4.0），详细条款参见[LICENSE](./LICENSE)。若您认为本数据集内的任何内容侵犯了您的合法权益，请联系**wenmeng.yu@aminer.cn**申请移除相关内容。 ## 引用若您的研究工作得益于本项目，请考虑引用我们的论文。 bibtex @misc{wu2024alignmmbench, title={AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models}, author={Yuhang Wu and Wenmeng Yu and Yean Cheng and Yan Wang and Xiaohan Zhang and Jiazheng Xu and Ming Ding and Yuxiao Dong}, year={2024}, eprint={2406.09295}, archivePrefix={arXiv} }

提供机构：

maas

创建时间：

2024-08-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集