CMD(Condensed Movies Dataset)

Name: CMD(Condensed Movies Dataset)
Creator: OpenDataLab
Published: 2026-05-24 08:30:19
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/CMD

下载链接

链接失效反馈

官方服务：

资源简介：

一个大规模的视频数据集，包含带有详细字幕的电影片段。超过 3,000 部来自各种类型、国家和几十年的不同电影。带有语义字幕的电影关键场景片段。每部电影浓缩成大约 20 分钟的镜头，提供高效的视频故事。来自尖端人脸检测和跟踪模型的近 50 万个人脸轨迹。使用面部识别从人脸轨迹中识别出 8K 个字符。电影提供的字幕以及 YouTube 的自动生成的字幕。

A large-scale video dataset containing movie clips with detailed subtitles. It includes over 3,000 distinct movies spanning various genres, countries, and time periods. Key scene clips from these movies are paired with semantic subtitles. Each movie is condensed into roughly 20 minutes of footage to provide efficient video narratives. The dataset also features nearly 500,000 human face trajectories extracted from state-of-the-art face detection and tracking models, from which 8,000 distinct characters are identified via facial recognition. Additionally, it encompasses both the official subtitles accompanying the movies and the automatically generated subtitles from YouTube.

提供机构：

OpenDataLab

创建时间：

2022-08-10

搜集汇总

数据集介绍