maxwelljones14/refVFX_dataset
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/maxwelljones14/refVFX_dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
viewer: false
license: cc-by-4.0
task_categories:
- video-to-video
- image-to-video
tags:
- video
- video-editing
- visual-effects
- vfx
- temporal-transitions
- lora
pretty_name: "RefVFX Video Edits"
size_categories:
- 100K<n<1M
configs:
- config_name: code_based_edits
data_files:
- split: train
path: "data/code_based_edits/shard-*.tar"
- config_name: neural_v2v_data
data_files:
- split: train
path: "data/neural_v2v_data/shard-*.tar"
- config_name: I2V_LoRA
data_files:
- split: train
path: "data/I2V_LoRA/shard-*.tar"
---
# UNOFFICIAL reimplementation of dataset for: RefVFX
This dataset is an unofficial reimplementation of the [RefVFX](https://snap-research.github.io/RefVFX/) project for tuning-free visual effect transfer across videos. The data and code were generated with the help of the arxiv paper and AI coding software.
**Original Paper:** [Tuning-free Visual Effect Transfer across Videos](https://arxiv.org/abs/2601.07833)
## Dataset Statistics
<!-- STATS_START -->
| Subset | Effect Types | Total Pairs | Avg Pairs per Effect |
|--------|-------------|-------------|---------------------|
| Code-Based Edits | 2736 | 136,800 | 50.0 |
| Neural V2V | 114 | 22,922 | 201.1 |
| I2V LoRA | 48 | 6,995 | 145.7 |
| **Total** | **2898** | **166,717** | **57.5** |
<!-- STATS_END -->
## Examples
### Code-Based Edits
<table cellspacing="0" cellpadding="4">
<tr>
<th>Input Video</th>
<th>Mask</th>
<th>Output Video</th>
</tr>
<tr>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_1_input.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_1_mask.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_1_output.mp4" width="256" controls autoplay loop muted></video></td>
</tr>
<tr>
<td colspan="3"><b>Prompt:</b> <i>Transition between the input video and an edited version of the input video with the following editing instruction: Solarize with threshold<br>128. Transition between frames 1 and 7 using the following temporal effect: checkerboard reveal</i></td>
</tr>
<tr>
<td colspan="3"><b>Effect:</b> effect_solarize_threshold_128_temporal_checkerboard_center_0p5_0p5_end_frame_7_num_segments_6_softness_0p0196_start_frame_1_mask_full<br>orientation_horizontal | <b>Mask:</b> full</td>
</tr>
<tr>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_2_input.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_2_mask.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_2_output.mp4" width="256" controls autoplay loop muted></video></td>
</tr>
<tr>
<td colspan="3"><b>Prompt:</b> <i>Transition between the input person and an edited version of the input person with the following editing instruction: Glow size 30, colour<br>light Violet, brightness 1.5. Transition between frames 16 and 23 using the following temporal effect: alpha blend</i></td>
</tr>
<tr>
<td colspan="3"><b>Effect:</b> effect_glow_effect_glow_size_30_glow_color_243_49_230_glow_brightness_1p5_object_brightness_2_temporal_alpha_center_0p5_0p5_end_frame_23_num<br>segments_8_softness_0p0238_start_frame_16_mask_foreground_orientation_horizontal | <b>Mask:</b> foreground</td>
</tr>
<tr>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_3_input.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_3_mask.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/code_based_3_output.mp4" width="256" controls autoplay loop muted></video></td>
</tr>
<tr>
<td colspan="3"><b>Prompt:</b> <i>Keep the man unchanged, and transition the rest between the input and an edited version of the input with the following editing instruction:<br>Photocopy effect with contrast 1.7 and strength 0.77. Transition between frames 0 and 21 using the following temporal effect: circle out<br>with centre (0.50, ...</i></td>
</tr>
<tr>
<td colspan="3"><b>Effect:</b> effect_photocopy_contrast_1p7_strength_0p77_temporal_circle_out_center_0p5_0p5_end_frame_21_num_segments_6_softness_0p0455_start_frame_0<br>mask_background_orientation_horizontal | <b>Mask:</b> background</td>
</tr>
</table>
### Neural V2V Edits
<table cellspacing="0" cellpadding="4">
<tr>
<th>Input Video (no effect)</th>
<th>Conditioning Video for Output Video with effect</th>
<th>Output Video (with effect)</th>
</tr>
<tr>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_1_input.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_1_conditioning.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_1_output.mp4" width="256" controls autoplay loop muted></video></td>
</tr>
<tr>
<td colspan="3"><b>Prompt:</b> <i>middle-aged Middle Eastern man, one leg forward, frowning, side profile transitioning to the middle-aged Middle Eastern man is bending<br>forward, expression is confused, and the camera angle is now medium shot from waist up. Simultaneously, throughout the video, golden coins<br>rain down</i></td>
</tr>
<tr>
<td colspan="3"><b>Effect category:</b> golden coins rain down</td>
</tr>
<tr>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_2_input.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_2_conditioning.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_2_output.mp4" width="256" controls autoplay loop muted></video></td>
</tr>
<tr>
<td colspan="3"><b>Prompt:</b> <i>middle-aged European man, stretching, intense stare, birds-eye view transitioning to the middle-aged European man is walking forward,<br>expression is angry, and the camera angle is now extreme close-up on face. Simultaneously, throughout the video, ground becomes still<br>mirror-water</i></td>
</tr>
<tr>
<td colspan="3"><b>Effect category:</b> ground becomes still mirror-water</td>
</tr>
<tr>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_3_input.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_3_conditioning.mp4" width="256" controls autoplay loop muted></video></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/v2v_3_output.mp4" width="256" controls autoplay loop muted></video></td>
</tr>
<tr>
<td colspan="3"><b>Prompt:</b> <i>elderly Middle Eastern man, sitting on a chair, frowning, three-quarter view transitioning to the elderly Middle Eastern man is lying on<br>side, expression is frowning, and the camera angle is now extreme close-up on face. Simultaneously, throughout the video, ethereal<br>mid-ground mist drifts through</i></td>
</tr>
<tr>
<td colspan="3"><b>Effect category:</b> ethereal mid-ground mist drifts through</td>
</tr>
</table>
### I2V LoRA Edits
<table cellspacing="0" cellpadding="4">
<tr>
<th>Input Image</th>
<th>Output Video</th>
</tr>
<tr>
<td><img src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_1_input.png" width="256"></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_1_output.mp4" width="256" controls autoplay loop muted></video></td>
</tr>
<tr>
<td colspan="2"><b>Prompt:</b> <i>The video starts with A elderly Latin American man. The effect warr10r warrior it transforms A elderly Latin American man into a warrior<br>with a mountain range in the background. A elderly Latin American man now appears as a muscular warrior with tattoos, holding an axe with a<br>golden head, with a ...</i></td>
</tr>
<tr>
<td colspan="2"><b>Effect (LoRA trigger):</b> warr10r warrior it</td>
</tr>
<tr>
<td><img src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_2_input.png" width="256"></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_2_output.mp4" width="256" controls autoplay loop muted></video></td>
</tr>
<tr>
<td colspan="2"><b>Prompt:</b> <i>The video begins with a close-up of A young European woman. Their eyes begin to glow blue, and a bright e13c7r1c electricity effect starts<br>emanating from their body. The e13c7r1c electricity effect grows more intense, covering the entire body with crackling lightning. The<br>background is dark and d...</i></td>
</tr>
<tr>
<td colspan="2"><b>Effect (LoRA trigger):</b> e13c7r1c electricity effect</td>
</tr>
<tr>
<td><img src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_3_input.png" width="256"></td>
<td><video src="https://huggingface.co/datasets/maxwelljones14/refVFX_dataset/resolve/main/examples/i2v_3_output.mp4" width="256" controls autoplay loop muted></video></td>
</tr>
<tr>
<td colspan="2"><b>Prompt:</b> <i>The video begins with A middle-aged Middle Eastern man. 5en3m venom transformation. A middle-aged Middle Eastern man transforms into Venom,<br>depicted with the iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation<br>is smooth and se...</i></td>
</tr>
<tr>
<td colspan="2"><b>Effect (LoRA trigger):</b> 5en3m venom transformation.</td>
</tr>
</table>
## Dataset Subsets
This dataset uses **WebDataset** format (tar shards) for efficient streaming of large video files. Each subset can be loaded independently:
```python
from datasets import load_dataset
# Stream a subset (recommended for large datasets)
ds = load_dataset("maxwelljones14/refVFX_dataset", "code_based_edits")
for sample in ds:
# Media files are returned as raw bytes:
# sample["input_image_or_video.mp4"] -> bytes
# sample["output_video.mp4"] -> bytes
# sample["mask_or_output_conditioning.mp4"] -> bytes (if present)
# Text metadata is in the JSON entry:
# sample["json"] -> '{"prompt": "...", "effect_type": "...", ...}'
import json
meta = json.loads(sample["json"])
print(meta["prompt"], meta["effect_type"])
break
# Or load a specific subset fully into memory (if you have enough RAM)
ds = load_dataset("maxwelljones14/refVFX_dataset", "neural_v2v_data")
# I2V LoRA subset (input is a .png image instead of .mp4)
ds = load_dataset("maxwelljones14/refVFX_dataset", "I2V_LoRA")
```
### 1. `code_based_edits` -- Programmatic Temporal Effects
Deterministic, code-based video editing triplets: **(input video, output video, mask)** paired with text prompts. Each sample applies a **spatial visual effect** (e.g., posterize, pixelate, glitch, emboss) combined with a **temporal transition** (e.g., wipe, circle reveal, diagonal fade) to an input video, producing an output video where the effect appears progressively over time.
The effects are code-based and deterministic: given the same parameters and input, the output is always reproducible. Parameters are randomized across samples to provide diversity. The effects aren't the exact same as the original paper, but take the effects from the paper and slightly augment them.
The `effect_type` for this subset is the full folder name encoding the specific combination of spatial effect, temporal transition, mask type, and orientation (e.g., `"effect_posterize_frames_20_temporal_left_to_right_linear_wipe_mask_full_orientation_horizontal"`). All samples with the same `effect_type` share the same effect configuration.
**Source videos:** [Senorita](https://huggingface.co/datasets/SENORITADATASET/Senorita) grounding dataset (person-centric videos with segmentation masks, 33 frames at 8 fps).
**Spatial Effects (30 types):** `posterize_frames`, `pixelate_frames`, `invert_frames`, `wave_warp`, `update_saturation_brightness`, `gaussian_blur`, `add_grain`, `black_and_white`, `color_overlay`, `cc_ball_action`, `sticker_effect`, `glow_effect`, `radial_blur`, `rotate_pixels`, `glitch_effect`, `dither`, `photocopy`, `motion_blur`, `stutter`, `ghosting`, `strobe`, `emboss`, `edge_detect`, `vignette`, `solarize`, `kaleidoscope`, `halftone`, `thermal`, `fisheye`, `scanlines`
**Temporal Transitions (20+ types):** Linear wipes, diagonal wipes, circle/rectangle/diamond in/out reveals, clock wipe, blinds, checkerboard, noise dissolve, spiral wipe, cross in/out, stripe patterns, alpha cross-dissolve.
### 2. `neural_v2v_data` -- Neural Video-to-Video Edits
Neural video-to-video edits generated by diffusion models Using the algorithm in Section 3.2.2 of the [original paper](https://arxiv.org/pdf/2601.07833). Each sample consists of a base video (v1, no effect) and an effect video (v2, same motion + a visual effect applied). These are quality-filtered, and around 50% are retained.
<!-- - `effect_score > 3` (effect is visible and meaningful)
- `gradual_score > 3` (effect appears gradually, not abruptly)
- `artifact_score > 4` (no significant visual artifacts)
- `flow_spike_ratio <= 5.0` (no sudden motion discontinuities) -->
The `effect_type` for this subset is the specific effect description (e.g., `"infrared look with white foliage and dark sky"`, `"confetti shower from above"`). These fall into 6 broader categories: `object_addition`, `weather_atmospheric`, `artistic_stylistic`, `particle_element`, `color_palette_tonal`, `surreal_fantasy`.
### 3. `I2V_LoRA` -- LoRA-based Image-to-Video Effects
Image-to-video effects generated using LoRA adapters applied to a video diffusion model, mostly from [here](https://huggingface.co/collections/Remade-AI/wan21-14b-480p-i2v-loras). Each sample consists of an input image and a generated video with the LoRA effect applied. These are quality-filtered by a multi-score evaluation with a final `verdict` field (only `"accepted"` samples are included).
The `effect_type` for this subset is the LoRA trigger phrase (e.g., `"cr4n3 crane down camera motion"`).
`mask_or_output_conditioning` and `mask_type` are `None` for this subset.
## Dataset Structure
The dataset is stored as **WebDataset tar shards** (~22 GB each). Each sample in a shard consists of the following entries, keyed by a 6-digit sample index:
| Tar Entry | Type | Description |
|-----------|------|-------------|
| `{key}.input_image_or_video.mp4` (or `.png`) | bytes | Input video (code-based, V2V) or input image (I2V) |
| `{key}.output_video.mp4` | bytes | Output video with effect applied |
| `{key}.mask_or_output_conditioning.mp4` | bytes | Binary mask (code-based), conditioning video (V2V); absent for I2V |
| `{key}.json` | JSON string | Text metadata (see below) |
**JSON metadata fields:**
| Field | Type | Description |
|-------|------|-------------|
| `prompt` | string | Text description of the edit |
| `effect_type` | string | Full effect folder name (code-based), specific effect description (V2V), or LoRA trigger (I2V) |
| `mask_type` | string or null | `full`, `foreground`, `background`, or null (I2V) |
| `orientation` | string | `horizontal` or `vertical` |
| `data_subset` | string | `code_based_edits`, `neural_v2v_data`, or `I2V_LoRA` |
### Mask Types (code-based edits only, taken from Senorita Dataset masks)
- **full**: The effect is applied to the entire frame.
- **foreground**: The effect is applied only to the detected person/object.
- **background**: The effect is applied to the background; the person/object is unchanged.
## Citation For Original Paper
```bibtex
@article{jones2026tuning,
title={Tuning-free Visual Effect Transfer across Videos},
author={Jones, Maxwell and Abdal, Rameen and Patashnik, Or and Salakhutdinov, Ruslan and Tulyakov, Sergey and Zhu, Jun-Yan and Wang, Kuan-Chieh Jackson},
journal={arXiv preprint arXiv:2601.07833},
year={2026}
}
```
## License
This dataset is released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license. NOTE: the dataset was produced at CMU, with all code and video generation created from scratch using the publicly available arxiv paper and claude code as the only resources for code generation.
提供机构:
maxwelljones14



