five

SushantGautam/VideoUFO

收藏
Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/SushantGautam/VideoUFO
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 size_categories: - 1M<n<10M task_categories: - text-to-video - text-to-image - image-to-video - image-to-image dataset_info: features: - name: ID dtype: string - name: Topic dtype: string - name: Detailed_Caption dtype: string - name: Brief_Caption dtype: string - name: Start_Time dtype: string - name: End_Time dtype: string - name: Aesthetic_Quality dtype: float32 - name: Background_Consistency dtype: float32 - name: Dynamic_Degree dtype: float32 - name: Imaging_Quality dtype: float32 - name: Motion_Smoothness dtype: float32 - name: Subject_Consistency dtype: float32 splits: - name: Full num_bytes: 1169704006 num_examples: 1091712 download_size: 543323065 dataset_size: 1169704006 configs: - config_name: default data_files: - split: Full path: data/Full-* tags: - video-generation - text-to-video-dataset --- # VideoUFO (Lightweight Version) This dataset is a **lightweight version** of the original **VideoUFO** dataset. Original dataset: [https://huggingface.co/datasets/WenhaoWang/VideoUFO](https://huggingface.co/datasets/WenhaoWang/VideoUFO) This fork removes the **`Middle_Frame` image column**, which significantly reduces memory usage and makes the dataset easier to load into dataframes and analysis pipelines. Original videos can be downloaded from https://huggingface.co/datasets/WenhaoWang/VideoUFO/tree/main/VideoUFO_tar. Once unzipped, you can map to "ID" above. # Loading the Dataset and Mapping with Videos ```python from datasets import load_dataset ds_ = load_dataset("SushantGautam/VideoUFO")["Full"] # Map with .mp4's extracted form the zips in https://huggingface.co/datasets/WenhaoWang/VideoUFO/tree/main/VideoUFO_tar from glob import glob import os videos_base = "/Users/sushantgautam/Downloads/VideoUFO" # CHANGE # parent folders which contains unzipped .mp4 videos mp4s = {os.path.basename(p).replace(".mp4", ""): p for p in glob(f"{videos_base}/**/*.mp4", recursive=True)} ds = ds_.filter(lambda x: x['ID'] in mp4s) ds = ds.map(lambda x: {"video": mp4s[x['ID']]}) ds[0] #gives {'ID': '--2nxiwGZ4k.13', 'Topic': 'music', 'Detailed_Caption': 'The ... individuals.', 'Brief_Caption': 'A grou.. camera.', 'Start_Time': '0:01:58.750', 'End_Time': '0:02:03.916', 'Aesthetic_Quality': 0.490, 'Background_Consistency': 0.90, 'Dynamic_Degree': 1.0, 'Imaging_Quality': 0.32, 'Motion_Smoothness': 0.99, 'Subject_Consistency': 0.81, 'video': '/Users/sushantgautam/Downloads/VideoUFO/--2nxiwGZ4k.13.mp4'} # <-- your local path # function to convert time string to seconds to_sec = lambda t: sum(float(x) * 60**i for i, x in enumerate(reversed(t.split(":")))) # (optional) filter out clips shorter than 15 seconds ds = ds.filter(lambda x: to_sec(x["End_Time"]) - to_sec(x["Start_Time"]) >= 15) ``` # Motivation In the original dataset, each entry includes a **`Middle_Frame` image** representing the middle frame of the video clip. While useful for some tasks, this column can: * dramatically increase dataset size * slow down dataframe loading * consume unnecessary memory when working with **text-only metadata** Many workflows such as: * prompt analysis * topic distribution analysis * caption modeling * dataset filtering * metadata preprocessing do **not require image data**. Therefore this fork provides a **clean metadata-only version** for efficient use in such pipelines. --- # What Changed The following column was **removed**: | Column | Type | Reason | | -------------- | ----- | ------------------------------------------------------- | | `Middle_Frame` | image | Very large and unnecessary for metadata-based workflows | All other columns remain unchanged. --- # Dataset Structure Each entry contains the following fields: | Column | Type | Description | | ------------------------ | ------- | ------------------------------------ | | `ID` | string | Unique identifier for the video clip | | `Topic` | string | Topic category of the video | | `Detailed_Caption` | string | Detailed caption describing the clip | | `Brief_Caption` | string | Short caption | | `Start_Time` | string | Start timestamp of the clip | | `End_Time` | string | End timestamp of the clip | | `Aesthetic_Quality` | float32 | VBench aesthetic score | | `Background_Consistency` | float32 | Background stability score | | `Dynamic_Degree` | float32 | Motion intensity score | | `Imaging_Quality` | float32 | Visual quality score | | `Motion_Smoothness` | float32 | Motion smoothness score | | `Subject_Consistency` | float32 | Subject consistency score | --- # Dataset Size | Split | Examples | | ----- | --------- | | Full | 1,091,712 | This fork contains the **same number of samples as the original dataset**, but without image data. # When to Use This Version Use this dataset if you want to: * analyze captions or prompts * build text-to-video training pipelines * run topic statistics * perform dataset filtering * load the dataset into pandas or Spark efficiently If you need **image frames**, please use the original dataset. # Original Dataset This dataset is derived from: **VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation** Authors: * Wenhao Wang * Yi Yang Paper: [https://huggingface.co/papers/2503.01739](https://huggingface.co/papers/2503.01739) Original dataset: [https://huggingface.co/datasets/WenhaoWang/VideoUFO](https://huggingface.co/datasets/WenhaoWang/VideoUFO) --- # License The dataset follows the same license as the original dataset: **CC BY 4.0** [https://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/) --- # Citation If you use this dataset, please cite the original paper: ``` @inproceedings{wang2025videoufo, title={VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation}, author={Wenhao Wang and Yi Yang}, booktitle={NeurIPS Datasets and Benchmarks Track}, year={2025} } ```
提供机构:
SushantGautam
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作