VLUE
收藏arXiv2022-05-31 更新2024-06-21 收录
下载链接:
https://vlue-benchmark.github.io
下载链接
链接失效反馈官方服务:
资源简介:
VLUE是一个多任务多维度的视觉语言理解评估基准,旨在评估视觉语言预训练模型的泛化能力和效率-性能权衡。该数据集包含五个基本任务,每个任务都有自己的私有分布外(OOD)测试集,这些测试集是通过众包方式在MaRVL数据集的图像上注释的,确保图像分布与COCO/VG数据集不同。VLUE不仅关注模型的绝对性能,还关注其在更广泛图像和概念上的泛化能力以及实际应用中的效率-性能权衡。
VLUE is a multi-task and multi-dimensional visual-language understanding evaluation benchmark, which aims to evaluate the generalization ability and efficiency-performance trade-off of visual-language pre-trained models. This benchmark comprises five core tasks, each equipped with its own private out-of-distribution (OOD) test set. These test sets are annotated via crowdsourcing on images from the MaRVL dataset, ensuring that their image distributions differ from those of the COCO and VG datasets. Beyond focusing on the absolute performance of models, VLUE also evaluates their generalization capabilities across a broader range of images and concepts, as well as the efficiency-performance trade-off in real-world applications.
提供机构:
字节跳动AI实验室
创建时间:
2022-05-31



