Open-Sora-Plan-v1.0.0
收藏魔搭社区2025-12-18 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/PKU-YuanLab/Open-Sora-Plan-v1.0.0
下载链接
链接失效反馈官方服务:
资源简介:
# Open-Sora-Dataset
Welcome to the Open-Sora-DataSet project! As part of the [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) project, we specifically talk about the collection and processing of data sets. To build a high-quality video dataset for the open-source world, we started this project. 💪
We warmly welcome you to join us! Let's contribute to the open-source world together! Thank you for your support and contribution.
**If you like our project, please give us a star ⭐ on [GitHub](https://github.com/PKU-YuanGroup/Open-Sora-Plan) for latest update.**
欢迎来到Open-Sora-DataSet项目!我们作为Open-Sora—Plan项目的一部分,详细阐述数据集的收集和处理。为给开源世界构建一个高质量的视频数据,我们发起了这个项目。💪
我们非常欢迎您的加入!让我们共同为开源的世界贡献力量!感谢您的支持和贡献。
如果你喜欢我们的项目,请为我们的[项目](https://github.com/PKU-YuanGroup/Open-Sora-Plan)支持点赞!
## Data Construction for Open-Sora-Plan v1.0.0
### Data distribution
we crawled 40258 videos from open-source websites under the CC0 license. All videos are of high quality without watermarks and All videos are of high quality without watermarks, and about 60% of them are landscape data. The total duration is about **274h 05m 13s**The main sources of data are divided into three parts:
1. [mixkit](https://mixkit.co/):The total number of videos we collected is **1234**, the total duration is about **6h 19m 32s**, and the total number of frames is **570815**. The resolution and aspect ratio distribution histogram of the video is as follows (the ones that account for less than 1% are not listed):
<img src="assets/v1.0.0_mixkit_resolution_plot.png" width="400" /> <img src="assets/v1.0.0_mixkit_aspect_ratio_plot.png" width="400" />
2. [pexels](https://www.pexels.com/zh-cn/):The total number of videos we collected is **7408** the total duration is about **48h 49m 24s** and the total number of frames is **5038641**. The resolution and aspect ratio distribution histogram of the video is as follows (the ones that account for less than 1% are not listed):
<img src="assets/v1.0.0_pexels_resolution_plot.png" height="300" /> <img src="assets/v1.0.0_pexels_aspect_ratio_plot.png" height="300" />
3. [pixabay](https://pixabay.com/):The total number of videos we collected is **31616** the total duration is about **218h 56m 17s** and the total number of frames is **23508970**. The resolution and aspect ratio distribution histogram of the video is as follows (the ones that account for less than 1% are not listed):
<img src="assets/v1.0.0_pixabay_resolution_plot.png" height="300" /> <img src="assets/v1.0.0_pixabay_aspect_ratio_plot.png" height="300" />
### Dense captions
it is challenging to directly crawl a large quantity of high-quality dense captions from the internet. Therefore, we utilize a mature Image-captioner model to obtain high-quality dense captions. We conducted ablation experiments on two multimodal large models: [ShareGPT4V-Captioner-7B](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/README.md) and [LLaVA-1.6-34B](https://github.com/haotian-liu/LLaVA). The former is specifically designed for caption generation, while the latter is a general-purpose multimodal large model. After conducting our ablation experiments, we found that they are comparable in performance. However, there is a significant difference in their inference speed on the A800 GPU: 40s/it of batch size of 12 for ShareGPT4V-Captioner-7B, 15s/it of batch size of 1 for LLaVA-1.6-34B. We open-source all annotations [here](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.0.0). We show some statistics here, and we set the maximum length of the model to 300, which covers almost 99% of the samples.
| Name | Avg length | Max | Std |
|---|---|---|---|
| ShareGPT4V-Captioner-7B | 170.0827524529121 | 467 | 53.689967539537776 |
| LLaVA-1.6-34B | 141.75851073472666 | 472 | 48.52492072346965 |
## Video split
### Video with transitions
Use [panda-70m](https://github.com/snap-research/Panda-70M/tree/main/splitting) to split transition video
### Video without transitions
1. Clone this repository and navigate to Open-Sora-Plan folder
```
git clone https://github.com/PKU-YuanGroup/Open-Sora-Plan
cd Open-Sora-Plan
```
2. Install the required packages
```
conda create -n opensora python=3.8 -y
conda activate opensora
pip install -e .
```
3. Split video script
```
git clone https://github.com/PKU-YuanGroup/Open-Sora-Dataset
python split/no_transition.py --video_json_file /path/to/your_video /path/to/save
```
If you want to know more, check out [Requirements and Installation](https://github.com/PKU-YuanGroup/Open-Sora-Plan?tab=readme-ov-file#%EF%B8%8F-requirements-and-installation)
## Acknowledgement 👍
Qingdao Weiyi Network Technology Co., Ltd.: Thank you very much for providing us with valuable data
# Open-Sora 数据集(Open-Sora-Dataset)
欢迎来到Open-Sora数据集项目!作为[Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan)项目的子项目,本项目聚焦于数据集的收集与处理工作。为了为开源社区打造高质量视频数据集,我们启动了本项目 💪
我们热忱欢迎您的加入!让我们携手为开源社区贡献力量,感谢您的支持与贡献。
**如果您喜爱本项目,请前往 [GitHub](https://github.com/PKU-YuanGroup/Open-Sora-Plan) 为我们点亮星标 ⭐,以获取最新动态。**
## Open-Sora-Plan v1.0.0 数据集构建
### 数据集分布
我们从遵循CC0许可协议的开源网站爬取了40258条视频,所有视频均为无水印的高质量内容,其中约60%为横屏素材。总时长约**274小时05分13秒**。数据主要来源于三个渠道:
1. [Mixkit](https://mixkit.co/):我们共采集到**1234条**视频,总时长约**6小时19分32秒**,总帧数为**570815**。视频的分辨率与宽高比分布直方图如下(占比不足1%的类别未列出):
<img src="assets/v1.0.0_mixkit_resolution_plot.png" width="400" /> <img src="assets/v1.0.0_mixkit_aspect_ratio_plot.png" width="400" />
2. [Pexels](https://www.pexels.com/zh-cn/):我们共采集到**7408条**视频,总时长约**48小时49分24秒**,总帧数为**5038641**。视频的分辨率与宽高比分布直方图如下(占比不足1%的类别未列出):
<img src="assets/v1.0.0_pexels_resolution_plot.png" height="300" /> <img src="assets/v1.0.0_pexels_aspect_ratio_plot.png" height="300" />
3. [Pixabay](https://pixabay.com/):我们共采集到**31616条**视频,总时长约**218小时56分17秒**,总帧数为**23508970**。视频的分辨率与宽高比分布直方图如下(占比不足1%的类别未列出):
<img src="assets/v1.0.0_pixabay_resolution_plot.png" height="300" /> <img src="assets/v1.0.0_pixabay_aspect_ratio_plot.png" height="300" />
### 密集字幕(Dense Captions)
直接从互联网爬取大规模高质量密集字幕极具挑战。因此,我们采用成熟的图像字幕生成(Image-captioner)模型来获取高质量密集字幕。我们针对两款多模态大语言模型(Large Language Model, LLM)开展了消融实验:[ShareGPT4V-Captioner-7B](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/README.md) 与 [LLaVA-1.6-34B](https://github.com/haotian-liu/LLaVA)。前者专为字幕生成任务设计,后者则为通用型多模态大语言模型。经消融实验验证,二者性能相当,但在A800 GPU上的推理速度存在显著差异:ShareGPT4V-Captioner-7B在批量大小为12时的推理速度为40秒/迭代,LLaVA-1.6-34B在批量大小为1时的推理速度为15秒/迭代。我们已将全部标注数据开源至 [此处](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.0.0)。本次实验将模型最大生成长度设置为300,该长度可覆盖近99%的样本。
| 模型名称 | 平均长度 | 最大长度 | 标准差 |
|---|---|---|---|
| ShareGPT4V-Captioner-7B | 170.0827524529121 | 467 | 53.689967539537776 |
| LLaVA-1.6-34B | 141.75851073472666 | 472 | 48.52492072346965 |
## 视频分割
### 含转场的视频
我们采用 [panda-70m](https://github.com/snap-research/Panda-70M/tree/main/splitting) 工具对含转场的视频进行分割。
### 无转场的视频
1. 克隆本仓库并进入Open-Sora-Plan目录
git clone https://github.com/PKU-YuanGroup/Open-Sora-Plan
cd Open-Sora-Plan
2. 安装依赖包
conda create -n opensora python=3.8 -y
conda activate opensora
pip install -e .
3. 视频分割脚本
git clone https://github.com/PKU-YuanGroup/Open-Sora-Dataset
python split/no_transition.py --video_json_file /path/to/your_video /path/to/save
如需了解更多信息,请参阅 [安装与环境配置要求](https://github.com/PKU-YuanGroup/Open-Sora-Plan?tab=readme-ov-file#%EF%B8%8F-requirements-and-installation)
## 致谢 👍
青岛微易网络科技有限公司:衷心感谢其为我们提供了宝贵的数据集支持。
提供机构:
maas
创建时间:
2025-06-05



