avhallubench

Name: avhallubench
Creator: maas
Published: 2025-07-24 16:29:34
License: 暂无描述

魔搭社区2025-07-24 更新2025-05-24 收录

下载链接：

https://modelscope.cn/datasets/scb10x/avhallubench

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for AVHalluBench - The dataset is for benchmarking hallucination levels in *audio-visual* LLMs. It consists of 175 videos and each video has hallucination-free audio and visual descriptions. The statistics are provided in the figure below, and more information can be found in our paper. - **Paper**: [CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models](https://arxiv.org/abs/2405.13684) - **Multimodal Hallucination Leaderboard**: https://huggingface.co/spaces/scb10x/multimodal-hallucination-leaderboard ### Dataset Summary - The videos can be found and downloaded at https://huggingface.co/datasets/potsawee/avhallubench/tree/main/videos. Each video can be identified using `video_id`. - Model-generated outputs can be compared against the provided audio and visual descriptions. ## Dataset Structure Each instance consists of: - `video_id`: ID for each video - `source`: Data source of each video - `audio_description`: hallucination-free manual **audio description** - `visual_description`: hallucination-free manual **video description** ### Citation Information ``` @misc{sun2024crosscheckgpt, title={CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models}, author={Guangzhi Sun and Potsawee Manakul and Adian Liusie and Kunat Pipatanakul and Chao Zhang and Phil Woodland and Mark Gales}, year={2024}, eprint={2405.13684}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

# AVHalluBench 数据集卡片 - 本数据集用于评测音视觉大语言模型的幻觉程度。数据集包含175段视频，每段视频均配有无幻觉的人工音频描述与视觉描述。详细统计信息见下图，更多细节可查阅我们的研究论文。 - **论文**：[CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models](https://arxiv.org/abs/2405.13684) - **多模态幻觉评测榜单**：https://huggingface.co/spaces/scb10x/multimodal-hallucination-leaderboard ### 数据集概述 - 可通过 https://huggingface.co/datasets/potsawee/avhallubench/tree/main/videos 浏览并下载所有视频，每段视频均可通过`video_id`进行唯一标识。 - 可将模型生成的输出与本数据集提供的音频描述及视觉描述进行对比校验。 ## 数据集结构每个数据样本包含以下字段： - `video_id`：视频唯一标识符 - `source`：视频数据来源 - `audio_description`：无幻觉的人工标注音频描述 - `visual_description`：无幻觉的人工标注视频描述 ### 引用信息 @misc{sun2024crosscheckgpt, title={"CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models"}, author={Guangzhi Sun and Potsawee Manakul and Adian Liusie and Kunat Pipatanakul and Chao Zhang and Phil Woodland and Mark Gales}, year={2024}, eprint={2405.13684}, archivePrefix={arXiv}, primaryClass={cs.CL} }

提供机构：

maas

创建时间：

2025-05-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集