AlignMMBench
收藏魔搭社区2026-01-02 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/ZhipuAI/AlignMMBench
下载链接
链接失效反馈官方服务:
资源简介:
# AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
<font size=4><div align='center' > [[🍎 Project Page](https://alignmmbench.github.io/)] [[📖 arXiv Paper](https://arxiv.org/pdf/2406.09295)] [[📊 Dataset](https://huggingface.co/datasets/THUDM/AlignMMBench)] </div></font>
<p align="center">
<img src="./assets/index.png" width="96%" height="50%">
</p>
---
## 🔥 News
* **`2024.06.14`** 🌟 We released AlignMMBench, a comprehensive alignment benchmark for vision language models!
## 👀 Introduce to AlignMMBench
AlignMMBench is a multimodal alignment benchmark that encompasses both single-turn and multi-turn dialogue scenarios. It includes three categories and thirteen capability tasks, with a total of 4,978 question-answer pairs.
### Features
1. **High-Quality Annotations**: Reliable benchmark with meticulous human annotation and multi-stage quality control processes.
2. **Self Critic**: To improve the controllability of alignment evaluation, we introduce the CritiqueVLM, a ChatGLM3-6B based evaluator that has been rule-calibrated and carefully finetuned. With human judgements, its evaluation consistency surpasses that of GPT-4.
3. **Diverse Data**: Three categories and thirteen capability tasks, including both single-turn and multi-turn dialogue scenarios.
<img src="./assets/image_examples.png" width="100%" height="50%">
## 📈 Results
<p align="center">
<img src="./assets/leaderboard.png" width="96%" height="50%">
</p>
## License
The use of the dataset and the original videos is governed by the Creative Commons Attribution-NonCommercial-ShareAlike
4.0 International (CC BY-NC-SA 4.0) license, as detailed in the [LICENSE](./LICENSE).
If you believe that any content in this dataset infringes on your rights, please contact us at **wenmeng.yu@aminer.cn** to request its
removal.
## Citation
If you find our work helpful for your research, please consider citing our work.
```bibtex
@misc{wu2024alignmmbench,
title={AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models},
author={Yuhang Wu and Wenmeng Yu and Yean Cheng and Yan Wang and Xiaohan Zhang and Jiazheng Xu and Ming Ding and Yuxiao Dong},
year={2024},
eprint={2406.09295},
archivePrefix={arXiv}
}
```
# AlignMMBench:面向大视觉语言模型的中文多模态对齐评测
<font size=4><div align='center' > [[🍎 项目主页](https://alignmmbench.github.io/)] [[📖 arXiv论文](https://arxiv.org/pdf/2406.09295)] [[📊 数据集](https://huggingface.co/datasets/THUDM/AlignMMBench)] </div></font>
<p align="center">
<img src="./assets/index.png" width="96%" height="50%">
</p>
---
## 🔥 最新动态
* **`2024.06.14`** 🌟 我们正式发布AlignMMBench——一款面向视觉语言模型的综合性对齐评测基准!
## 👀 AlignMMBench简介
AlignMMBench是一款涵盖单轮与多轮对话场景的多模态对齐评测基准,共包含3大类、13项能力任务,总计4978组问答对。
### 核心特性
1. **高质量标注**:本基准经过严谨的人工标注与多阶段质量管控流程,具备极高的评测可靠性。
2. **自评校验机制**:为提升对齐评测的可控性,我们引入了基于ChatGLM3-6B的评测器CritiqueVLM,该模型经过规则校准与精细化微调。结合人工评判结果,其评测一致性超越GPT-4。
3. **数据多样性**:评测基准覆盖3大类共13项能力任务,同时包含单轮与多轮对话场景。
<img src="./assets/image_examples.png" width="100%" height="50%">
## 📈 评测结果
<p align="center">
<img src="./assets/leaderboard.png" width="96%" height="50%">
</p>
## 许可协议
本数据集与原始视频的使用需遵循知识共享署名-非商业性使用-相同方式共享4.0国际许可协议(Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International,CC BY-NC-SA 4.0),详细条款参见[LICENSE](./LICENSE)。
若您认为本数据集内的任何内容侵犯了您的合法权益,请联系**wenmeng.yu@aminer.cn**申请移除相关内容。
## 引用
若您的研究工作得益于本项目,请考虑引用我们的论文。
bibtex
@misc{wu2024alignmmbench,
title={AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models},
author={Yuhang Wu and Wenmeng Yu and Yean Cheng and Yan Wang and Xiaohan Zhang and Jiazheng Xu and Ming Ding and Yuxiao Dong},
year={2024},
eprint={2406.09295},
archivePrefix={arXiv}
}
提供机构:
maas
创建时间:
2024-08-19



