harvardairobotics/FiVE-Fine-Grained-Video-Editing-Benchmark

Name: harvardairobotics/FiVE-Fine-Grained-Video-Editing-Benchmark
Creator: harvardairobotics
Published: 2026-04-09 20:36:58
License: 暂无描述

Hugging Face2026-04-09 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/harvardairobotics/FiVE-Fine-Grained-Video-Editing-Benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc-by-nc-4.0 size_categories: - 100M<n<1B task_categories: - text-to-video pretty_name: FiVE Benchmark tags: - Video-Editing library_name: datasets configs: - config_name: edit1 data_files: edit_prompt/edit1_FiVE.json - config_name: edit2 data_files: edit_prompt/edit2_FiVE.json - config_name: edit3 data_files: edit_prompt/edit3_FiVE.json - config_name: edit4 data_files: edit_prompt/edit4_FiVE.json - config_name: edit5 data_files: edit_prompt/edit5_FiVE.json - config_name: edit6 data_files: edit_prompt/edit6_FiVE.json --- # [FiVE-Bench](https://arxiv.org/abs/2503.13684) (ICCV 2025) [FiVE-Bench: A Fine-Grained Video Editing Benchmark for Evaluating Diffusion and Rectified Flow Models](https://arxiv.org/abs/2503.13684) > [Minghan Li](https://scholar.google.com/citations?user=LhdBgMAAAAAJ&hl=en)1*, [Chenxi Xie](https://openreview.net/profile?id=%7EChenxi_Xie1)2*, [Yichen Wu](https://scholar.google.com/citations?hl=zh-CN&user=p53r6j0AAAAJ&hl=en)13, [Lei Zhang](https://scholar.google.com/citations?user=tAK5l1IAAAAJ&hl=en)2, [Mengyu Wang](https://scholar.google.com/citations?user=i9B02k4AAAAJ&hl=en)1† > 1Harvard University 2The Hong Kong Polytechnic University 3City University of Hong Kong > *Equal contribution †Corresponding Author 💜 [Leaderboard (coming soon)]()   |   💻 [GitHub](https://github.com/MinghanLi/FiVE-Bench)   |   🤗 [Hugging Face](https://huggingface.co/datasets/LIMinghan/FiVE-Fine-Grained-Video-Editing-Benchmark)   📝 [Project Page](https://sites.google.com/view/five-benchmark)   |   📰 [Paper](https://arxiv.org/abs/2503.13684)   |   🎥 [Video Demo](https://sites.google.com/view/five-benchmark)   FiVE is a benchmark comprising **100 videos** for fine-grained video editing evaluation. It includes **74 real-world videos** curated from the DAVIS dataset (sampled at 8-frame intervals) and **26 highly realistic synthetic videos** generated using the Wan2.1 text-to-video model. These videos provide a diverse representation of editing challenges in real and synthetic content. <img src="assets/five_pipeline.png" alt="Dataset Pipeline" width="800"/> --- ## Updates **2025-08-26**: Add all eval results on Wan-Edit `./results/8-Wan-Edit-Eval` **2025-08-26**: Fixed typos in edit_prompt JSON files where `save_dir` field was corrected. --- ## Benchmark Overview <img src="assets/five.png" alt="Dataset Overview" width="800"/> Basic information: - **Structured Captions**: Generated by GPT-4o, capturing object category, action, background, and camera movement. - **Object Deformation Records**: Includes annotations for limb movements and other non-rigid transformations. - **Six Editing Tasks**: Six fine-grained editing tasks with **420 high-quality prompt pairs**: 1. **Object replacement (rigid)** 2. **Object replacement (non-rigid deformation)** 3. **Color alteration** 4. **Material modification** 5. **Object addition** 6. **Object removal** Data structure: ```json 📁 FiVE-Fine-Grained-Video-Editing-Benchmark ├── 📁 assets/ ├── 📁 edit_prompt/ │ ├── 📄 edit1_FiVE.json │ ├── 📄 edit2_FiVE.json │ ├── 📄 edit3_FiVE.json │ ├── 📄 edit4_FiVE.json │ ├── 📄 edit5_FiVE.json │ └── 📄 edit6_FiVE.json ├── 📄 README.md ├── 📦 bmasks.zip ├── 📁 bmasks │ ├── 📁 0001_bus │ ├── 🖼️ 00001.jpg │ ├── 🖼️ 00002.jpg │ ├── 🖼️ ... │ ├── 📁 ... ├── 📦 images.zip ├── 📁 images │ ├── 📁 0001_bus │ ├── 🖼️ 00001.jpg │ ├── 🖼️ 00002.jpg │ ├── 🖼️ ... │ ├── 📁 ... ├── 📦 videos.zip ├── 📁 videos │ ├── 🎞️ 0001_bus.mp4 │ ├── 🎞️ 0002_girl-dog.mp4 │ ├── 🎞️ ... ``` --- ## FiVE-Bench Evaluation <img src="assets/five-acc.jpg" alt="Evaluation Metric" width="800"/> To facilitate model evaluation, the dataset provides **two major components**: ### 📐 1. Conventional Metrics (Across Six Key Aspects) These metrics quantitatively measure various dimensions of video editing quality: - **Structure Preservation** - **Background Preservation** (PSNR, LPIPS, MSE, SSIM outside the editing mask) - **Edit Prompt–Image Consistency** (CLIP similarity on full and masked images) - **Image Quality Assessment** ([NIQE](https://github.com/chaofengc/IQA-PyTorch)) - **Temporal Consistency** (MFS: [Motion Fidelity Score](https://github.com/diffusion-motion-transfer/diffusion-motion-transfer/blob/main/motion_fidelity_score.py)): - **Runtime Efficiency** <img src="assets/five-bench-eval1.png" alt="five-bench-eval1" width="800"/> ### 🤖 2. FiVE-Acc: A VLM-based Metric for Editing Success FiVE-Acc evaluates editing success using a vision-language model (VLM) by asking content-related questions: - **YN-Acc**: Yes/No question accuracy - **MC-Acc**: Multiple-choice question accuracy - **U-Acc**: Union accuracy – success if any question is correct - **∩-Acc**: Intersection accuracy – success only if all questions are correct - **FiVE-Acc** ↑: Final score = average of all above metrics (higher is better) <img src="assets/five-bench-eval2.png" alt="five-bench-eval2" width="400"/> --- ## 📚 Citation If you use **FiVE-Bench** in your research, please cite us: ```bibtex @article{li2025five, title={Five: A fine-grained video editing benchmark for evaluating emerging diffusion and rectified flow models}, author={Li, Minghan and Xie, Chenxi and Wu, Yichen and Zhang, Lei and Wang, Mengyu}, journal={arXiv preprint arXiv:2503.13684}, year={2025} } ```

提供机构：

harvardairobotics

5,000+

优质数据集

54 个

任务类型

进入经典数据集