SushantGautam/VideoUFO
收藏Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/SushantGautam/VideoUFO
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
size_categories:
- 1M<n<10M
task_categories:
- text-to-video
- text-to-image
- image-to-video
- image-to-image
dataset_info:
features:
- name: ID
dtype: string
- name: Topic
dtype: string
- name: Detailed_Caption
dtype: string
- name: Brief_Caption
dtype: string
- name: Start_Time
dtype: string
- name: End_Time
dtype: string
- name: Aesthetic_Quality
dtype: float32
- name: Background_Consistency
dtype: float32
- name: Dynamic_Degree
dtype: float32
- name: Imaging_Quality
dtype: float32
- name: Motion_Smoothness
dtype: float32
- name: Subject_Consistency
dtype: float32
splits:
- name: Full
num_bytes: 1169704006
num_examples: 1091712
download_size: 543323065
dataset_size: 1169704006
configs:
- config_name: default
data_files:
- split: Full
path: data/Full-*
tags:
- video-generation
- text-to-video-dataset
---
# VideoUFO (Lightweight Version)
This dataset is a **lightweight version** of the original **VideoUFO** dataset.
Original dataset:
[https://huggingface.co/datasets/WenhaoWang/VideoUFO](https://huggingface.co/datasets/WenhaoWang/VideoUFO)
This fork removes the **`Middle_Frame` image column**, which significantly reduces memory usage and makes the dataset easier to load into dataframes and analysis pipelines.
Original videos can be downloaded from https://huggingface.co/datasets/WenhaoWang/VideoUFO/tree/main/VideoUFO_tar.
Once unzipped, you can map to "ID" above.
# Loading the Dataset and Mapping with Videos
```python
from datasets import load_dataset
ds_ = load_dataset("SushantGautam/VideoUFO")["Full"]
# Map with .mp4's extracted form the zips in https://huggingface.co/datasets/WenhaoWang/VideoUFO/tree/main/VideoUFO_tar
from glob import glob
import os
videos_base = "/Users/sushantgautam/Downloads/VideoUFO" # CHANGE
# parent folders which contains unzipped .mp4 videos
mp4s = {os.path.basename(p).replace(".mp4", ""): p for p in glob(f"{videos_base}/**/*.mp4", recursive=True)}
ds = ds_.filter(lambda x: x['ID'] in mp4s)
ds = ds.map(lambda x: {"video": mp4s[x['ID']]})
ds[0]
#gives
{'ID': '--2nxiwGZ4k.13', 'Topic': 'music', 'Detailed_Caption': 'The ... individuals.', 'Brief_Caption': 'A grou.. camera.', 'Start_Time': '0:01:58.750', 'End_Time': '0:02:03.916', 'Aesthetic_Quality': 0.490, 'Background_Consistency': 0.90, 'Dynamic_Degree': 1.0, 'Imaging_Quality': 0.32, 'Motion_Smoothness': 0.99, 'Subject_Consistency': 0.81,
'video': '/Users/sushantgautam/Downloads/VideoUFO/--2nxiwGZ4k.13.mp4'} # <-- your local path
# function to convert time string to seconds
to_sec = lambda t: sum(float(x) * 60**i for i, x in enumerate(reversed(t.split(":"))))
# (optional) filter out clips shorter than 15 seconds
ds = ds.filter(lambda x: to_sec(x["End_Time"]) - to_sec(x["Start_Time"]) >= 15)
```
# Motivation
In the original dataset, each entry includes a **`Middle_Frame` image** representing the middle frame of the video clip.
While useful for some tasks, this column can:
* dramatically increase dataset size
* slow down dataframe loading
* consume unnecessary memory when working with **text-only metadata**
Many workflows such as:
* prompt analysis
* topic distribution analysis
* caption modeling
* dataset filtering
* metadata preprocessing
do **not require image data**.
Therefore this fork provides a **clean metadata-only version** for efficient use in such pipelines.
---
# What Changed
The following column was **removed**:
| Column | Type | Reason |
| -------------- | ----- | ------------------------------------------------------- |
| `Middle_Frame` | image | Very large and unnecessary for metadata-based workflows |
All other columns remain unchanged.
---
# Dataset Structure
Each entry contains the following fields:
| Column | Type | Description |
| ------------------------ | ------- | ------------------------------------ |
| `ID` | string | Unique identifier for the video clip |
| `Topic` | string | Topic category of the video |
| `Detailed_Caption` | string | Detailed caption describing the clip |
| `Brief_Caption` | string | Short caption |
| `Start_Time` | string | Start timestamp of the clip |
| `End_Time` | string | End timestamp of the clip |
| `Aesthetic_Quality` | float32 | VBench aesthetic score |
| `Background_Consistency` | float32 | Background stability score |
| `Dynamic_Degree` | float32 | Motion intensity score |
| `Imaging_Quality` | float32 | Visual quality score |
| `Motion_Smoothness` | float32 | Motion smoothness score |
| `Subject_Consistency` | float32 | Subject consistency score |
---
# Dataset Size
| Split | Examples |
| ----- | --------- |
| Full | 1,091,712 |
This fork contains the **same number of samples as the original dataset**, but without image data.
# When to Use This Version
Use this dataset if you want to:
* analyze captions or prompts
* build text-to-video training pipelines
* run topic statistics
* perform dataset filtering
* load the dataset into pandas or Spark efficiently
If you need **image frames**, please use the original dataset.
# Original Dataset
This dataset is derived from:
**VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation**
Authors:
* Wenhao Wang
* Yi Yang
Paper:
[https://huggingface.co/papers/2503.01739](https://huggingface.co/papers/2503.01739)
Original dataset:
[https://huggingface.co/datasets/WenhaoWang/VideoUFO](https://huggingface.co/datasets/WenhaoWang/VideoUFO)
---
# License
The dataset follows the same license as the original dataset:
**CC BY 4.0**
[https://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/)
---
# Citation
If you use this dataset, please cite the original paper:
```
@inproceedings{wang2025videoufo,
title={VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation},
author={Wenhao Wang and Yi Yang},
booktitle={NeurIPS Datasets and Benchmarks Track},
year={2025}
}
```
提供机构:
SushantGautam



