VLBiasBench
收藏ieee-dataport.org2025-03-26 收录
下载链接:
https://ieee-dataport.org/documents/vlbiasbench
下载链接
链接失效反馈官方服务:
资源简介:
The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence.However, these advancements are accompanied by concerns about biased outputs, a challenge that has yet to be thoroughly explored.Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and narrow sources of bias.To address this problem, we introduce VLBiasBench, a comprehensive benchmark designed to evaluate biases in LVLMs. VLBiasBench, features a dataset that covers nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status, as well as two intersectional bias categories: race × gender and race × social economic status. To build a large-scale dataset, we use Stable Diffusion XL model to generate 46,848 high-quality images, which are combined with various questions to creat 128,342 samples.These questions are divided into open-ended and close-ended types, ensuring thorough consideration of bias sources and a comprehensive evaluation of LVLM biases from multiple perspectives.We conduct extensive evaluations on 15 open-source models as well as two advanced closed-source models, yielding new insights into the biases present in these models.
大型视觉语言模型(LVLMs)的涌现标志着迈向通用人工智能的重要进展。然而,这些进展伴随着对输出偏见的担忧,这一问题尚未得到充分的探讨。现有的基准测试由于数据规模有限、提问格式单一以及偏见来源狭窄,在评估偏见方面尚不充分。为解决这一问题,我们推出了VLBiasBench,这是一个旨在评估LVLMs中偏见的综合性基准。VLBiasBench拥有一份数据集,涵盖了九种不同的社会偏见类别,包括年龄、残疾状况、性别、国籍、外貌、种族、宗教、职业和社会经济状况,以及两种交叉偏见类别:种族×性别和种族×社会经济状况。为了构建大规模数据集,我们使用Stable Diffusion XL模型生成了46,848张高质量图像,并与各种问题相结合,创造了128,342个样本。这些问题分为开放式和封闭式两种类型,以确保全面考虑偏见来源,并从多个角度对LVLM偏见进行综合评估。我们对15个开源模型以及两个高级闭源模型进行了广泛的评估,从而对这些模型中存在的偏见获得了新的见解。
提供机构:
ieee-dataport.org



