five

AlignMMBench

收藏
魔搭社区2026-01-02 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/ZhipuAI/AlignMMBench
下载链接
链接失效反馈
官方服务:
资源简介:
# AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models <font size=4><div align='center' > [[🍎 Project Page](https://alignmmbench.github.io/)] [[📖 arXiv Paper](https://arxiv.org/pdf/2406.09295)] [[📊 Dataset](https://huggingface.co/datasets/THUDM/AlignMMBench)] </div></font> <p align="center"> <img src="./assets/index.png" width="96%" height="50%"> </p> --- ## 🔥 News * **`2024.06.14`** 🌟 We released AlignMMBench, a comprehensive alignment benchmark for vision language models! ## 👀 Introduce to AlignMMBench AlignMMBench is a multimodal alignment benchmark that encompasses both single-turn and multi-turn dialogue scenarios. It includes three categories and thirteen capability tasks, with a total of 4,978 question-answer pairs. ### Features 1. **High-Quality Annotations**: Reliable benchmark with meticulous human annotation and multi-stage quality control processes. 2. **Self Critic**: To improve the controllability of alignment evaluation, we introduce the CritiqueVLM, a ChatGLM3-6B based evaluator that has been rule-calibrated and carefully finetuned. With human judgements, its evaluation consistency surpasses that of GPT-4. 3. **Diverse Data**: Three categories and thirteen capability tasks, including both single-turn and multi-turn dialogue scenarios. <img src="./assets/image_examples.png" width="100%" height="50%"> ## 📈 Results <p align="center"> <img src="./assets/leaderboard.png" width="96%" height="50%"> </p> ## License The use of the dataset and the original videos is governed by the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license, as detailed in the [LICENSE](./LICENSE). If you believe that any content in this dataset infringes on your rights, please contact us at **wenmeng.yu@aminer.cn** to request its removal. ## Citation If you find our work helpful for your research, please consider citing our work. ```bibtex @misc{wu2024alignmmbench, title={AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models}, author={Yuhang Wu and Wenmeng Yu and Yean Cheng and Yan Wang and Xiaohan Zhang and Jiazheng Xu and Ming Ding and Yuxiao Dong}, year={2024}, eprint={2406.09295}, archivePrefix={arXiv} } ```

# AlignMMBench:面向大视觉语言模型的中文多模态对齐评测 <font size=4><div align='center' > [[🍎 项目主页](https://alignmmbench.github.io/)] [[📖 arXiv论文](https://arxiv.org/pdf/2406.09295)] [[📊 数据集](https://huggingface.co/datasets/THUDM/AlignMMBench)] </div></font> <p align="center"> <img src="./assets/index.png" width="96%" height="50%"> </p> --- ## 🔥 最新动态 * **`2024.06.14`** 🌟 我们正式发布AlignMMBench——一款面向视觉语言模型的综合性对齐评测基准! ## 👀 AlignMMBench简介 AlignMMBench是一款涵盖单轮与多轮对话场景的多模态对齐评测基准,共包含3大类、13项能力任务,总计4978组问答对。 ### 核心特性 1. **高质量标注**:本基准经过严谨的人工标注与多阶段质量管控流程,具备极高的评测可靠性。 2. **自评校验机制**:为提升对齐评测的可控性,我们引入了基于ChatGLM3-6B的评测器CritiqueVLM,该模型经过规则校准与精细化微调。结合人工评判结果,其评测一致性超越GPT-4。 3. **数据多样性**:评测基准覆盖3大类共13项能力任务,同时包含单轮与多轮对话场景。 <img src="./assets/image_examples.png" width="100%" height="50%"> ## 📈 评测结果 <p align="center"> <img src="./assets/leaderboard.png" width="96%" height="50%"> </p> ## 许可协议 本数据集与原始视频的使用需遵循知识共享署名-非商业性使用-相同方式共享4.0国际许可协议(Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International,CC BY-NC-SA 4.0),详细条款参见[LICENSE](./LICENSE)。 若您认为本数据集内的任何内容侵犯了您的合法权益,请联系**wenmeng.yu@aminer.cn**申请移除相关内容。 ## 引用 若您的研究工作得益于本项目,请考虑引用我们的论文。 bibtex @misc{wu2024alignmmbench, title={AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models}, author={Yuhang Wu and Wenmeng Yu and Yean Cheng and Yan Wang and Xiaohan Zhang and Jiazheng Xu and Ming Ding and Yuxiao Dong}, year={2024}, eprint={2406.09295}, archivePrefix={arXiv} }
提供机构:
maas
创建时间:
2024-08-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作