VALSE (VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena)

Name: VALSE (VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena)
Creator: OpenDataLab
Published: 2026-05-31 07:30:28
License: 暂无描述

OpenDataLab2026-05-31 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/VALSE

下载链接

链接失效反馈

官方服务：

资源简介：

我们提出了 VALSE（视觉和语言结构化评估），这是一种新颖的基准测试，旨在测试通用预训练视觉和语言 (V&L) 模型在特定语言现象上的视觉语言基础能力。 VALSE 提供一套六种测试，涵盖各种语言结构。解决这些问题需要模型以视觉形式中的语言现象为基础，从而允许比迄今为止可能的更细粒度的评估。我们希望 VALSE 能够作为一个重要的基准，从语言的角度衡量预训练的 V&L 模型的未来进展，补充以任务为中心的规范 V&L 评估。

We propose VALSE (Visual and Language Structured Evaluation), a novel benchmark designed to evaluate the vision-language grounding capabilities of general pre-trained vision-and-language (V&L) models with respect to specific linguistic phenomena. VALSE includes six distinct test suites covering a wide range of linguistic structures. Solving these tasks requires models to ground linguistic phenomena in visual modalities, enabling finer-grained evaluation than previously achievable. We envision VALSE as a critical benchmark for measuring future advancements of pre-trained V&L models from a linguistic perspective, complementing task-centric standard V&L evaluations.

提供机构：

OpenDataLab

创建时间：

2022-09-01

搜集汇总

数据集介绍