KEBench

Name: KEBench
Creator: 中国科学院自动化研究所
Published: 2024-03-12 14:16:33
License: 暂无描述

arXiv2024-03-12 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2403.07350v1

下载链接

链接失效反馈

官方服务：

资源简介：

KEBench是一个专为大型视觉-语言模型设计的知识编辑基准，由中国科学院自动化研究所创建。该数据集包含8174个编辑案例和18434张图片，旨在通过多种评估指标如可靠性、通用性、局部性和可移植性来全面评估知识编辑方法。数据集利用多模态知识图谱MMKG，确保图像与实体之间的清晰关联，支持从不同角度或外观选择图像对，以评估图像的通用性指标。此外，数据集还通过GPT-3.5 API生成问题和答案，用于构造测试示例，确保数据的质量和准确性。KEBench的应用领域主要集中在解决大型视觉-语言模型中知识编辑的挑战，如有效整合多模态内容并确保编辑的连贯性和上下文相关性。

KEBench is a knowledge editing benchmark specifically designed for large vision-language models, created by the Institute of Automation, Chinese Academy of Sciences. This dataset consists of 8,174 editing cases and 18,434 images, aiming to comprehensively evaluate knowledge editing methods via multiple evaluation metrics including reliability, generality, locality, and portability. The dataset leverages the Multimodal Knowledge Graph (MMKG) to ensure clear associations between images and entities, and supports selecting image pairs from different perspectives or appearances to evaluate the generality metric of images. In addition, the dataset also generates questions and answers via the GPT-3.5 API to construct test examples, ensuring the quality and accuracy of the data. The application scenarios of KEBench mainly focus on addressing the challenges of knowledge editing in large vision-language models, such as effectively integrating multimodal content and ensuring the coherence and contextual relevance of edits.

提供机构：

中国科学院自动化研究所

创建时间：

2024-03-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集