CMD(Condensed Movies Dataset)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/CMD
下载链接
链接失效反馈官方服务:
资源简介:
一个大规模的视频数据集,包含带有详细字幕的电影片段。超过 3,000 部来自各种类型、国家和几十年的不同电影。带有语义字幕的电影关键场景片段。每部电影浓缩成大约 20 分钟的镜头,提供高效的视频故事。来自尖端人脸检测和跟踪模型的近 50 万个人脸轨迹。使用面部识别从人脸轨迹中识别出 8K 个字符。电影提供的字幕以及 YouTube 的自动生成的字幕。
A large-scale video dataset containing movie clips with detailed subtitles. It includes over 3,000 distinct movies spanning various genres, countries, and time periods. Key scene clips from these movies are paired with semantic subtitles. Each movie is condensed into roughly 20 minutes of footage to provide efficient video narratives. The dataset also features nearly 500,000 human face trajectories extracted from state-of-the-art face detection and tracking models, from which 8,000 distinct characters are identified via facial recognition. Additionally, it encompasses both the official subtitles accompanying the movies and the automatically generated subtitles from YouTube.
提供机构:
OpenDataLab
创建时间:
2022-08-10
搜集汇总
数据集介绍

背景与挑战
背景概述
CMD是一个大规模的视频数据集,包含超过3,000部电影的关键场景片段,每部电影浓缩成约20分钟的镜头,并带有详细字幕。该数据集还包含近50万个人脸轨迹和8K个识别出的字符,适用于视频故事检索和相关研究。
以上内容由遇见数据集搜集并总结生成



