DiDeMoSV

Name: DiDeMoSV
Creator: OpenDataLab
Published: 2026-05-17 07:30:32
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/DiDeMoSV

下载链接

链接失效反馈

官方服务：

资源简介：

DiDeMoSV. DiDeMo是一个视频字幕数据集，其中包含10,000短片，其中包含40,000多个文本描述，这些文本描述在时间上与视频进行了本地化。每个剪辑都是从基于Flickr的YFCC100M数据集中随机采样的。这导致视频覆盖了大量的真实场景，包含许多不同的设置、动作、实体等。数据集分别包含训练、验证和测试中的11550/2707/3378个样本，每个样本包含三个连续的帧。与现有的故事可视化数据集相比，该数据集对故事延续模型提出了挑战，以生成多样化的输入，涵盖了更多的故事元素。为了做到这一点，模型必须最大限度地利用初始场景输入，并且需要合并额外的一般视觉知识，无论这是通过转移学习还是额外的数据来完成的。

DiDeMoSV. DiDeMo is a video captioning dataset that contains 10,000 short video clips and over 40,000 text descriptions temporally localized to corresponding video segments. Each clip is randomly sampled from the Flickr-based YFCC100M dataset, enabling the dataset to cover a wide range of real-world scenarios with diverse settings, actions, entities and other elements. The dataset is divided into 11550, 2707 and 3378 samples for the training, validation and test splits respectively, with each sample including three consecutive video frames. Compared with existing story visualization datasets, this dataset presents more challenges for story continuation models, as it requires generating diverse inputs that cover a broader spectrum of story elements. To fulfill this requirement, models must maximize the utilization of initial scene inputs and integrate additional general visual knowledge, which can be achieved via transfer learning or supplementary training data.

提供机构：

OpenDataLab

创建时间：

2022-11-02

搜集汇总

数据集介绍