maxwelljones14/refVFX_dataset

Name: maxwelljones14/refVFX_dataset
Creator: maxwelljones14
Published: 2026-04-16 21:23:31
License: 暂无描述

Hugging Face2026-04-16 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/maxwelljones14/refVFX_dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- viewer: false license: cc-by-4.0 task_categories: - video-to-video - image-to-video tags: - video - video-editing - visual-effects - vfx - temporal-transitions - lora pretty_name: "RefVFX Video Edits" size_categories: - 100K<n<1M configs: - config_name: code_based_edits data_files: - split: train path: "data/code_based_edits/shard-*.tar" - config_name: neural_v2v_data data_files: - split: train path: "data/neural_v2v_data/shard-*.tar" - config_name: I2V_LoRA data_files: - split: train path: "data/I2V_LoRA/shard-*.tar" --- # UNOFFICIAL reimplementation of dataset for: RefVFX This dataset is an unofficial reimplementation of the [RefVFX](https://snap-research.github.io/RefVFX/) project for tuning-free visual effect transfer across videos. The data and code were generated with the help of the arxiv paper and AI coding software. **Original Paper:** [Tuning-free Visual Effect Transfer across Videos](https://arxiv.org/abs/2601.07833) ## Dataset Statistics  | Subset | Effect Types | Total Pairs | Avg Pairs per Effect | |--------|-------------|-------------|---------------------| | Code-Based Edits | 2736 | 136,800 | 50.0 | | Neural V2V | 114 | 22,922 | 201.1 | | I2V LoRA | 48 | 6,995 | 145.7 | | **Total** | **2898** | **166,717** | **57.5** |  ## Examples ### Code-Based Edits <table cellspacing="0" cellpadding="4"> <tr> <th>Input Video</th> <th>Mask</th> <th>Output Video</th> </tr> <tr> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_1_input.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_1_mask.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_1_output.mp4" width="256" controls autoplay loop muted></video></td> </tr> <tr> <td colspan="3">Prompt: Transition between the input video and an edited version of the input video with the following editing instruction: Solarize with threshold 128. Transition between frames 1 and 7 using the following temporal effect: checkerboard reveal</td> </tr> <tr> <td colspan="3">Effect: effect_solarize_threshold_128_temporal_checkerboard_center_0p5_0p5_end_frame_7_num_segments_6_softness_0p0196_start_frame_1_mask_full orientation_horizontal  |  Mask: full</td> </tr> <tr> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_2_input.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_2_mask.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_2_output.mp4" width="256" controls autoplay loop muted></video></td> </tr> <tr> <td colspan="3">Prompt: Transition between the input person and an edited version of the input person with the following editing instruction: Glow size 30, colour light Violet, brightness 1.5. Transition between frames 16 and 23 using the following temporal effect: alpha blend</td> </tr> <tr> <td colspan="3">Effect: effect_glow_effect_glow_size_30_glow_color_243_49_230_glow_brightness_1p5_object_brightness_2_temporal_alpha_center_0p5_0p5_end_frame_23_num segments_8_softness_0p0238_start_frame_16_mask_foreground_orientation_horizontal  |  Mask: foreground</td> </tr> <tr> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_3_input.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_3_mask.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_3_output.mp4" width="256" controls autoplay loop muted></video></td> </tr> <tr> <td colspan="3">Prompt: Keep the man unchanged, and transition the rest between the input and an edited version of the input with the following editing instruction: Photocopy effect with contrast 1.7 and strength 0.77. Transition between frames 0 and 21 using the following temporal effect: circle out with centre (0.50, ...</td> </tr> <tr> <td colspan="3">Effect: effect_photocopy_contrast_1p7_strength_0p77_temporal_circle_out_center_0p5_0p5_end_frame_21_num_segments_6_softness_0p0455_start_frame_0 mask_background_orientation_horizontal  |  Mask: background</td> </tr> </table> ### Neural V2V Edits <table cellspacing="0" cellpadding="4"> <tr> <th>Input Video (no effect)</th> <th>Conditioning Video for Output Video with effect</th> <th>Output Video (with effect)</th> </tr> <tr> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_1_input.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_1_conditioning.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_1_output.mp4" width="256" controls autoplay loop muted></video></td> </tr> <tr> <td colspan="3">Prompt: middle-aged Middle Eastern man, one leg forward, frowning, side profile transitioning to the middle-aged Middle Eastern man is bending forward, expression is confused, and the camera angle is now medium shot from waist up. Simultaneously, throughout the video, golden coins rain down</td> </tr> <tr> <td colspan="3">Effect category: golden coins rain down</td> </tr> <tr> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_2_input.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_2_conditioning.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_2_output.mp4" width="256" controls autoplay loop muted></video></td> </tr> <tr> <td colspan="3">Prompt: middle-aged European man, stretching, intense stare, birds-eye view transitioning to the middle-aged European man is walking forward, expression is angry, and the camera angle is now extreme close-up on face. Simultaneously, throughout the video, ground becomes still mirror-water</td> </tr> <tr> <td colspan="3">Effect category: ground becomes still mirror-water</td> </tr> <tr> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_3_input.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_3_conditioning.mp4" width="256" controls autoplay loop muted></video></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_3_output.mp4" width="256" controls autoplay loop muted></video></td> </tr> <tr> <td colspan="3">Prompt: elderly Middle Eastern man, sitting on a chair, frowning, three-quarter view transitioning to the elderly Middle Eastern man is lying on side, expression is frowning, and the camera angle is now extreme close-up on face. Simultaneously, throughout the video, ethereal mid-ground mist drifts through</td> </tr> <tr> <td colspan="3">Effect category: ethereal mid-ground mist drifts through</td> </tr> </table> ### I2V LoRA Edits <table cellspacing="0" cellpadding="4"> <tr> <th>Input Image</th> <th>Output Video</th> </tr> <tr> <td><img src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_1_input.png" width="256"></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_1_output.mp4" width="256" controls autoplay loop muted></video></td> </tr> <tr> <td colspan="2">Prompt: The video starts with A elderly Latin American man. The effect warr10r warrior it transforms A elderly Latin American man into a warrior with a mountain range in the background. A elderly Latin American man now appears as a muscular warrior with tattoos, holding an axe with a golden head, with a ...</td> </tr> <tr> <td colspan="2">Effect (LoRA trigger): warr10r warrior it</td> </tr> <tr> <td><img src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_2_input.png" width="256"></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_2_output.mp4" width="256" controls autoplay loop muted></video></td> </tr> <tr> <td colspan="2">Prompt: The video begins with a close-up of A young European woman. Their eyes begin to glow blue, and a bright e13c7r1c electricity effect starts emanating from their body. The e13c7r1c electricity effect grows more intense, covering the entire body with crackling lightning. The background is dark and d...</td> </tr> <tr> <td colspan="2">Effect (LoRA trigger): e13c7r1c electricity effect</td> </tr> <tr> <td><img src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_3_input.png" width="256"></td> <td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_3_output.mp4" width="256" controls autoplay loop muted></video></td> </tr> <tr> <td colspan="2">Prompt: The video begins with A middle-aged Middle Eastern man. 5en3m venom transformation. A middle-aged Middle Eastern man transforms into Venom, depicted with the iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation is smooth and se...</td> </tr> <tr> <td colspan="2">Effect (LoRA trigger): 5en3m venom transformation.</td> </tr> </table> ## Dataset Subsets This dataset uses **WebDataset** format (tar shards) for efficient streaming of large video files. Each subset can be loaded independently: ```python from datasets import load_dataset # Stream a subset (recommended for large datasets) ds = load_dataset("maxwelljones14/refVFX_dataset", "code_based_edits") for sample in ds: # Media files are returned as raw bytes: # sample["input_image_or_video.mp4"] -> bytes # sample["output_video.mp4"] -> bytes # sample["mask_or_output_conditioning.mp4"] -> bytes (if present) # Text metadata is in the JSON entry: # sample["json"] -> '{"prompt": "...", "effect_type": "...", ...}' import json meta = json.loads(sample["json"]) print(meta["prompt"], meta["effect_type"]) break # Or load a specific subset fully into memory (if you have enough RAM) ds = load_dataset("maxwelljones14/refVFX_dataset", "neural_v2v_data") # I2V LoRA subset (input is a .png image instead of .mp4) ds = load_dataset("maxwelljones14/refVFX_dataset", "I2V_LoRA") ``` ### 1. `code_based_edits` -- Programmatic Temporal Effects Deterministic, code-based video editing triplets: **(input video, output video, mask)** paired with text prompts. Each sample applies a **spatial visual effect** (e.g., posterize, pixelate, glitch, emboss) combined with a **temporal transition** (e.g., wipe, circle reveal, diagonal fade) to an input video, producing an output video where the effect appears progressively over time. The effects are code-based and deterministic: given the same parameters and input, the output is always reproducible. Parameters are randomized across samples to provide diversity. The effects aren't the exact same as the original paper, but take the effects from the paper and slightly augment them. The `effect_type` for this subset is the full folder name encoding the specific combination of spatial effect, temporal transition, mask type, and orientation (e.g., `"effect_posterize_frames_20_temporal_left_to_right_linear_wipe_mask_full_orientation_horizontal"`). All samples with the same `effect_type` share the same effect configuration. **Source videos:** [Senorita](https://huggingface.co/datasets/SENORITADATASET/Senorita) grounding dataset (person-centric videos with segmentation masks, 33 frames at 8 fps). **Spatial Effects (30 types):** `posterize_frames`, `pixelate_frames`, `invert_frames`, `wave_warp`, `update_saturation_brightness`, `gaussian_blur`, `add_grain`, `black_and_white`, `color_overlay`, `cc_ball_action`, `sticker_effect`, `glow_effect`, `radial_blur`, `rotate_pixels`, `glitch_effect`, `dither`, `photocopy`, `motion_blur`, `stutter`, `ghosting`, `strobe`, `emboss`, `edge_detect`, `vignette`, `solarize`, `kaleidoscope`, `halftone`, `thermal`, `fisheye`, `scanlines` **Temporal Transitions (20+ types):** Linear wipes, diagonal wipes, circle/rectangle/diamond in/out reveals, clock wipe, blinds, checkerboard, noise dissolve, spiral wipe, cross in/out, stripe patterns, alpha cross-dissolve. ### 2. `neural_v2v_data` -- Neural Video-to-Video Edits Neural video-to-video edits generated by diffusion models Using the algorithm in Section 3.2.2 of the [original paper](https://arxiv.org/pdf/2601.07833). Each sample consists of a base video (v1, no effect) and an effect video (v2, same motion + a visual effect applied). These are quality-filtered, and around 50% are retained.  The `effect_type` for this subset is the specific effect description (e.g., `"infrared look with white foliage and dark sky"`, `"confetti shower from above"`). These fall into 6 broader categories: `object_addition`, `weather_atmospheric`, `artistic_stylistic`, `particle_element`, `color_palette_tonal`, `surreal_fantasy`. ### 3. `I2V_LoRA` -- LoRA-based Image-to-Video Effects Image-to-video effects generated using LoRA adapters applied to a video diffusion model, mostly from [here](https://huggingface.co/collections/Remade-AI/wan21-14b-480p-i2v-loras). Each sample consists of an input image and a generated video with the LoRA effect applied. These are quality-filtered by a multi-score evaluation with a final `verdict` field (only `"accepted"` samples are included). The `effect_type` for this subset is the LoRA trigger phrase (e.g., `"cr4n3 crane down camera motion"`). `mask_or_output_conditioning` and `mask_type` are `None` for this subset. ## Dataset Structure The dataset is stored as **WebDataset tar shards** (~22 GB each). Each sample in a shard consists of the following entries, keyed by a 6-digit sample index: | Tar Entry | Type | Description | |-----------|------|-------------| | `{key}.input_image_or_video.mp4` (or `.png`) | bytes | Input video (code-based, V2V) or input image (I2V) | | `{key}.output_video.mp4` | bytes | Output video with effect applied | | `{key}.mask_or_output_conditioning.mp4` | bytes | Binary mask (code-based), conditioning video (V2V); absent for I2V | | `{key}.json` | JSON string | Text metadata (see below) | **JSON metadata fields:** | Field | Type | Description | |-------|------|-------------| | `prompt` | string | Text description of the edit | | `effect_type` | string | Full effect folder name (code-based), specific effect description (V2V), or LoRA trigger (I2V) | | `mask_type` | string or null | `full`, `foreground`, `background`, or null (I2V) | | `orientation` | string | `horizontal` or `vertical` | | `data_subset` | string | `code_based_edits`, `neural_v2v_data`, or `I2V_LoRA` | ### Mask Types (code-based edits only, taken from Senorita Dataset masks) - **full**: The effect is applied to the entire frame. - **foreground**: The effect is applied only to the detected person/object. - **background**: The effect is applied to the background; the person/object is unchanged. ## Citation For Original Paper ```bibtex @article{jones2026tuning, title={Tuning-free Visual Effect Transfer across Videos}, author={Jones, Maxwell and Abdal, Rameen and Patashnik, Or and Salakhutdinov, Ruslan and Tulyakov, Sergey and Zhu, Jun-Yan and Wang, Kuan-Chieh Jackson}, journal={arXiv preprint arXiv:2601.07833}, year={2026} } ``` ## License This dataset is released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license. NOTE: the dataset was produced at CMU, with all code and video generation created from scratch using the publicly available arxiv paper and claude code as the only resources for code generation.

提供机构：

maxwelljones14

5,000+

优质数据集

54 个

任务类型

进入经典数据集