openinterx/UGC-VideoCap
收藏Hugging Face2025-08-20 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/openinterx/UGC-VideoCap
下载链接
链接失效反馈官方服务:
资源简介:
UGC-VideoCaptioner数据集是一个针对短视频内容的用户生成视频的详细全模态字幕新基准和模型框架。该数据集由1000个TikTok视频组成,通过一个结构化的三阶段人类在环管道进行注释,覆盖了音频、视觉以及音频视觉联合的语义。此外,数据集还包括4000个精心设计的问答对,用于探测单模态和跨模态的理解。
The UGC-VideoCaptioner dataset is a new benchmark and model framework specifically designed for detailed omnimodal captioning of short-form user-generated videos. The dataset consists of 1,000 TikTok videos annotated through a structured three-stage human-in-the-loop pipeline covering audio-only, visual-only, and joint audio-visual semantics. It also includes 4,000 carefully crafted QA pairs probing both unimodal and cross-modal understanding.
提供机构:
openinterx



