VHELM
收藏arXiv2025-09-30 收录
下载链接:
https://crfm.stanford.edu/helm/vhelm/latest/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是对HELM评估框架的扩展,旨在评估视觉语言模型(VLMs)的性能。此外,该数据集还提供了与GPT-4V模型相比的胜率评分,其任务是对视觉语言模型进行评估。
This dataset is an extension of the HELM evaluation framework, specifically designed to evaluate the performance of Vision-Language Models (VLMs). Additionally, it provides win-rate scores compared against the GPT-4V model, with its core task being the assessment of Vision-Language Models.



