five

Open-Sora-Plan-v1.0.0

收藏
魔搭社区2025-12-18 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/PKU-YuanLab/Open-Sora-Plan-v1.0.0
下载链接
链接失效反馈
官方服务:
资源简介:
# Open-Sora-Dataset Welcome to the Open-Sora-DataSet project! As part of the [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) project, we specifically talk about the collection and processing of data sets. To build a high-quality video dataset for the open-source world, we started this project. 💪 We warmly welcome you to join us! Let's contribute to the open-source world together! Thank you for your support and contribution. **If you like our project, please give us a star ⭐ on [GitHub](https://github.com/PKU-YuanGroup/Open-Sora-Plan) for latest update.** 欢迎来到Open-Sora-DataSet项目!我们作为Open-Sora—Plan项目的一部分,详细阐述数据集的收集和处理。为给开源世界构建一个高质量的视频数据,我们发起了这个项目。💪 我们非常欢迎您的加入!让我们共同为开源的世界贡献力量!感谢您的支持和贡献。 如果你喜欢我们的项目,请为我们的[项目](https://github.com/PKU-YuanGroup/Open-Sora-Plan)支持点赞! ## Data Construction for Open-Sora-Plan v1.0.0 ### Data distribution we crawled 40258 videos from open-source websites under the CC0 license. All videos are of high quality without watermarks and All videos are of high quality without watermarks, and about 60% of them are landscape data. The total duration is about **274h 05m 13s**The main sources of data are divided into three parts: 1. [mixkit](https://mixkit.co/):The total number of videos we collected is **1234**, the total duration is about **6h 19m 32s**, and the total number of frames is **570815**. The resolution and aspect ratio distribution histogram of the video is as follows (the ones that account for less than 1% are not listed): <img src="assets/v1.0.0_mixkit_resolution_plot.png" width="400" /> <img src="assets/v1.0.0_mixkit_aspect_ratio_plot.png" width="400" /> 2. [pexels](https://www.pexels.com/zh-cn/):The total number of videos we collected is **7408** the total duration is about **48h 49m 24s** and the total number of frames is **5038641**. The resolution and aspect ratio distribution histogram of the video is as follows (the ones that account for less than 1% are not listed): <img src="assets/v1.0.0_pexels_resolution_plot.png" height="300" /> <img src="assets/v1.0.0_pexels_aspect_ratio_plot.png" height="300" /> 3. [pixabay](https://pixabay.com/):The total number of videos we collected is **31616** the total duration is about **218h 56m 17s** and the total number of frames is **23508970**. The resolution and aspect ratio distribution histogram of the video is as follows (the ones that account for less than 1% are not listed): <img src="assets/v1.0.0_pixabay_resolution_plot.png" height="300" /> <img src="assets/v1.0.0_pixabay_aspect_ratio_plot.png" height="300" /> ### Dense captions it is challenging to directly crawl a large quantity of high-quality dense captions from the internet. Therefore, we utilize a mature Image-captioner model to obtain high-quality dense captions. We conducted ablation experiments on two multimodal large models: [ShareGPT4V-Captioner-7B](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/README.md) and [LLaVA-1.6-34B](https://github.com/haotian-liu/LLaVA). The former is specifically designed for caption generation, while the latter is a general-purpose multimodal large model. After conducting our ablation experiments, we found that they are comparable in performance. However, there is a significant difference in their inference speed on the A800 GPU: 40s/it of batch size of 12 for ShareGPT4V-Captioner-7B, 15s/it of batch size of 1 for LLaVA-1.6-34B. We open-source all annotations [here](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.0.0). We show some statistics here, and we set the maximum length of the model to 300, which covers almost 99% of the samples. | Name | Avg length | Max | Std | |---|---|---|---| | ShareGPT4V-Captioner-7B | 170.0827524529121 | 467 | 53.689967539537776 | | LLaVA-1.6-34B | 141.75851073472666 | 472 | 48.52492072346965 | ## Video split ### Video with transitions Use [panda-70m](https://github.com/snap-research/Panda-70M/tree/main/splitting) to split transition video ### Video without transitions 1. Clone this repository and navigate to Open-Sora-Plan folder ``` git clone https://github.com/PKU-YuanGroup/Open-Sora-Plan cd Open-Sora-Plan ``` 2. Install the required packages ``` conda create -n opensora python=3.8 -y conda activate opensora pip install -e . ``` 3. Split video script ``` git clone https://github.com/PKU-YuanGroup/Open-Sora-Dataset python split/no_transition.py --video_json_file /path/to/your_video /path/to/save ``` If you want to know more, check out [Requirements and Installation](https://github.com/PKU-YuanGroup/Open-Sora-Plan?tab=readme-ov-file#%EF%B8%8F-requirements-and-installation) ## Acknowledgement 👍 Qingdao Weiyi Network Technology Co., Ltd.: Thank you very much for providing us with valuable data

# Open-Sora 数据集(Open-Sora-Dataset) 欢迎来到Open-Sora数据集项目!作为[Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan)项目的子项目,本项目聚焦于数据集的收集与处理工作。为了为开源社区打造高质量视频数据集,我们启动了本项目 💪 我们热忱欢迎您的加入!让我们携手为开源社区贡献力量,感谢您的支持与贡献。 **如果您喜爱本项目,请前往 [GitHub](https://github.com/PKU-YuanGroup/Open-Sora-Plan) 为我们点亮星标 ⭐,以获取最新动态。** ## Open-Sora-Plan v1.0.0 数据集构建 ### 数据集分布 我们从遵循CC0许可协议的开源网站爬取了40258条视频,所有视频均为无水印的高质量内容,其中约60%为横屏素材。总时长约**274小时05分13秒**。数据主要来源于三个渠道: 1. [Mixkit](https://mixkit.co/):我们共采集到**1234条**视频,总时长约**6小时19分32秒**,总帧数为**570815**。视频的分辨率与宽高比分布直方图如下(占比不足1%的类别未列出): <img src="assets/v1.0.0_mixkit_resolution_plot.png" width="400" /> <img src="assets/v1.0.0_mixkit_aspect_ratio_plot.png" width="400" /> 2. [Pexels](https://www.pexels.com/zh-cn/):我们共采集到**7408条**视频,总时长约**48小时49分24秒**,总帧数为**5038641**。视频的分辨率与宽高比分布直方图如下(占比不足1%的类别未列出): <img src="assets/v1.0.0_pexels_resolution_plot.png" height="300" /> <img src="assets/v1.0.0_pexels_aspect_ratio_plot.png" height="300" /> 3. [Pixabay](https://pixabay.com/):我们共采集到**31616条**视频,总时长约**218小时56分17秒**,总帧数为**23508970**。视频的分辨率与宽高比分布直方图如下(占比不足1%的类别未列出): <img src="assets/v1.0.0_pixabay_resolution_plot.png" height="300" /> <img src="assets/v1.0.0_pixabay_aspect_ratio_plot.png" height="300" /> ### 密集字幕(Dense Captions) 直接从互联网爬取大规模高质量密集字幕极具挑战。因此,我们采用成熟的图像字幕生成(Image-captioner)模型来获取高质量密集字幕。我们针对两款多模态大语言模型(Large Language Model, LLM)开展了消融实验:[ShareGPT4V-Captioner-7B](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/README.md) 与 [LLaVA-1.6-34B](https://github.com/haotian-liu/LLaVA)。前者专为字幕生成任务设计,后者则为通用型多模态大语言模型。经消融实验验证,二者性能相当,但在A800 GPU上的推理速度存在显著差异:ShareGPT4V-Captioner-7B在批量大小为12时的推理速度为40秒/迭代,LLaVA-1.6-34B在批量大小为1时的推理速度为15秒/迭代。我们已将全部标注数据开源至 [此处](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.0.0)。本次实验将模型最大生成长度设置为300,该长度可覆盖近99%的样本。 | 模型名称 | 平均长度 | 最大长度 | 标准差 | |---|---|---|---| | ShareGPT4V-Captioner-7B | 170.0827524529121 | 467 | 53.689967539537776 | | LLaVA-1.6-34B | 141.75851073472666 | 472 | 48.52492072346965 | ## 视频分割 ### 含转场的视频 我们采用 [panda-70m](https://github.com/snap-research/Panda-70M/tree/main/splitting) 工具对含转场的视频进行分割。 ### 无转场的视频 1. 克隆本仓库并进入Open-Sora-Plan目录 git clone https://github.com/PKU-YuanGroup/Open-Sora-Plan cd Open-Sora-Plan 2. 安装依赖包 conda create -n opensora python=3.8 -y conda activate opensora pip install -e . 3. 视频分割脚本 git clone https://github.com/PKU-YuanGroup/Open-Sora-Dataset python split/no_transition.py --video_json_file /path/to/your_video /path/to/save 如需了解更多信息,请参阅 [安装与环境配置要求](https://github.com/PKU-YuanGroup/Open-Sora-Plan?tab=readme-ov-file#%EF%B8%8F-requirements-and-installation) ## 致谢 👍 青岛微易网络科技有限公司:衷心感谢其为我们提供了宝贵的数据集支持。
提供机构:
maas
创建时间:
2025-06-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作