CaReBench

Name: CaReBench
Creator: maas
Published: 2025-12-05 16:57:54
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/MCG-NJU/CaReBench

下载链接

链接失效反馈

官方服务：

资源简介：

<div align="center"> <h1 style="margin: 0"> <img src="assets/logo.png" style="width:1.5em; vertical-align: middle; display: inline-block; margin: 0" alt="Logo"> <span style="vertical-align: middle; display: inline-block; margin: 0"><b>CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval</b></span> </h1> <p style="margin: 0"> Yifan Xu, <a href="https://scholar.google.com/citations?user=evR3uR0AAAAJ">Xinhao Li</a>, Yichun Yang, Desen Meng, Rui Huang, <a href="https://scholar.google.com/citations?user=HEuN8PcAAAAJ">Limin Wang</a> </p> <p align="center"> 🤗 <a href="https://huggingface.co/MCG-NJU/CaRe-7B">Model</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/datasets/MCG-NJU/CaReBench">Data</a> &nbsp&nbsp｜ &nbsp&nbsp 📑 <a href="https://arxiv.org/pdf/2501.00513">Paper</a> &nbsp&nbsp </p> </div> ![](assets/comparison.png) ## 📝 Introduction **🌟 CaReBench** is a fine-grained benchmark comprising **1,000 high-quality videos** with detailed human-annotated captions, including **manually separated spatial and temporal descriptions** for independent spatiotemporal bias evaluation. ![CaReBench](assets/carebench.png) **📊 ReBias and CapST Metrics** are designed specifically for retrieval and captioning tasks, providing a comprehensive evaluation framework for spatiotemporal understanding in video-language models. **⚡ CaRe: A Unified Baseline** for fine-grained video retrieval and captioning, achieving competitive performance through **two-stage Supervised Fine-Tuning (SFT)**. CaRe excels in both generating detailed video descriptions and extracting robust video features. ![CaRe Training Recipe](assets/care_model.png) **🚀 State-of-the-art performance** on both detailed video captioning and fine-grained video retrieval. CaRe outperforms CLIP-based retrieval models and popular MLLMs in captioning tasks. ![alt text](assets/performance.png)

<div align="center"> <h1 style="margin: 0"> <img src="assets/logo.png" style="width:1.5em; vertical-align: middle; display: inline-block; margin: 0" alt="Logo"> <span style="vertical-align: middle; display: inline-block; margin: 0"><b>CaReBench：面向视频字幕生成与检索的细粒度基准测试集</b></span> </h1> <p style="margin: 0"> 徐一帆，李鑫浩，杨逸春，孟德森，黄锐，王利民 </p> <p align="center"> 🤗 <a href="https://huggingface.co/MCG-NJU/CaRe-7B">模型</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/datasets/MCG-NJU/CaReBench">数据集</a> &nbsp&nbsp｜ &nbsp&nbsp 📑 <a href="https://arxiv.org/pdf/2501.00513">论文</a> &nbsp&nbsp </p> </div> ![](assets/comparison.png) ## 📝 引言 **🌟 CaReBench** 是一个细粒度基准测试集，包含**1000个高质量视频**及详细的人工标注字幕，其中包含**手动分离的空间与时间描述**，可用于独立的时空偏倚评估。 ![CaReBench](assets/carebench.png) **📊 ReBias与CapST指标** 专为检索与字幕生成任务设计，为视频语言模型的时空理解能力提供了全面的评估框架。 **⚡ CaRe：细粒度视频检索与字幕生成的统一基线模型**，通过**两阶段监督微调（Supervised Fine-Tuning, SFT）**实现了极具竞争力的性能。CaRe在生成精细视频描述与提取鲁棒视频特征两方面均表现出色。 ![CaRe训练流程](assets/care_model.png) **🚀 CaRe在细粒度视频字幕生成与视频检索任务上均达到了当前最优性能**：其性能优于基于CLIP的检索模型，且在字幕生成任务上超越了主流多模态大语言模型（Multimodal Large Language Model, MLLM）。 ![性能对比](assets/performance.png)

提供机构：

maas

创建时间：

2025-12-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集