five

CaReBench

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/MCG-NJU/CaReBench
下载链接
链接失效反馈
官方服务:
资源简介:
<div align="center"> <h1 style="margin: 0"> <img src="assets/logo.png" style="width:1.5em; vertical-align: middle; display: inline-block; margin: 0" alt="Logo"> <span style="vertical-align: middle; display: inline-block; margin: 0"><b>CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval</b></span> </h1> <p style="margin: 0"> Yifan Xu, <a href="https://scholar.google.com/citations?user=evR3uR0AAAAJ">Xinhao Li</a>, Yichun Yang, Desen Meng, Rui Huang, <a href="https://scholar.google.com/citations?user=HEuN8PcAAAAJ">Limin Wang</a> </p> <p align="center"> 🤗 <a href="https://huggingface.co/MCG-NJU/CaRe-7B">Model</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/datasets/MCG-NJU/CaReBench">Data</a> &nbsp&nbsp| &nbsp&nbsp 📑 <a href="https://arxiv.org/pdf/2501.00513">Paper</a> &nbsp&nbsp </p> </div> ![](assets/comparison.png) ## 📝 Introduction **🌟 CaReBench** is a fine-grained benchmark comprising **1,000 high-quality videos** with detailed human-annotated captions, including **manually separated spatial and temporal descriptions** for independent spatiotemporal bias evaluation. ![CaReBench](assets/carebench.png) **📊 ReBias and CapST Metrics** are designed specifically for retrieval and captioning tasks, providing a comprehensive evaluation framework for spatiotemporal understanding in video-language models. **⚡ CaRe: A Unified Baseline** for fine-grained video retrieval and captioning, achieving competitive performance through **two-stage Supervised Fine-Tuning (SFT)**. CaRe excels in both generating detailed video descriptions and extracting robust video features. ![CaRe Training Recipe](assets/care_model.png) **🚀 State-of-the-art performance** on both detailed video captioning and fine-grained video retrieval. CaRe outperforms CLIP-based retrieval models and popular MLLMs in captioning tasks. ![alt text](assets/performance.png)

<div align="center"> <h1 style="margin: 0"> <img src="assets/logo.png" style="width:1.5em; vertical-align: middle; display: inline-block; margin: 0" alt="Logo"> <span style="vertical-align: middle; display: inline-block; margin: 0"><b>CaReBench:面向视频字幕生成与检索的细粒度基准测试集</b></span> </h1> <p style="margin: 0"> 徐一帆,李鑫浩,杨逸春,孟德森,黄锐,王利民 </p> <p align="center"> 🤗 <a href="https://huggingface.co/MCG-NJU/CaRe-7B">模型</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/datasets/MCG-NJU/CaReBench">数据集</a> &nbsp&nbsp| &nbsp&nbsp 📑 <a href="https://arxiv.org/pdf/2501.00513">论文</a> &nbsp&nbsp </p> </div> ![](assets/comparison.png) ## 📝 引言 **🌟 CaReBench** 是一个细粒度基准测试集,包含**1000个高质量视频**及详细的人工标注字幕,其中包含**手动分离的空间与时间描述**,可用于独立的时空偏倚评估。 ![CaReBench](assets/carebench.png) **📊 ReBias与CapST指标** 专为检索与字幕生成任务设计,为视频语言模型的时空理解能力提供了全面的评估框架。 **⚡ CaRe:细粒度视频检索与字幕生成的统一基线模型**,通过**两阶段监督微调(Supervised Fine-Tuning, SFT)**实现了极具竞争力的性能。CaRe在生成精细视频描述与提取鲁棒视频特征两方面均表现出色。 ![CaRe训练流程](assets/care_model.png) **🚀 CaRe在细粒度视频字幕生成与视频检索任务上均达到了当前最优性能**:其性能优于基于CLIP的检索模型,且在字幕生成任务上超越了主流多模态大语言模型(Multimodal Large Language Model, MLLM)。 ![性能对比](assets/performance.png)
提供机构:
maas
创建时间:
2025-12-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作