ImpossibleVideos
收藏魔搭社区2025-12-05 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/showlab/ImpossibleVideos
下载链接
链接失效反馈官方服务:
资源简介:
<div align="center">
<h1>Impossible Videos</h1>
[Zechen Bai](https://www.baizechen.site/) <sup>\*</sup>
[Hai Ci](https://haici.cc/) <sup>\*</sup>
[Mike Zheng Shou](https://sites.google.com/view/showlab) <sup></sup>
[Show Lab, National University of Singapore](https://sites.google.com/view/showlab/home?authuser=0)
[](https://huggingface.co/ShowLab)
[](https://arxiv.org/abs/2503.14378)
<p align="center">
<img src="assets/logo.jpg" alt="TAX" style="display: block; margin: 0 auto;" width="600px" />
</p>
</div>
## 🤔 What are impossible videos?
Impossible videos refer to videos displaying **counterfactual and anti-reality** scenes that are **impossible** in real world.
Please visit our [website](https://showlab.github.io/Impossible-Videos/) to find more examples.
## 💡 Why we interested in impossible videos?
Impossible videos can be a touch stone for advanced video models.
As an ***out-of-real-world-distribution*** data, it requires the model to not simply ***memorize*** real-world data and ***retrieve*** similar information based on the input, but to genuinely ***learn*** from real-world data and ***reason*** upon the input.
This project aims to advance video research by answering the follow important questions:
- Can today's video generation models effectively follow prompts to **generate** impossible video content?
- Are today's video understanding models good enough for **understanding** impossible videos?
## 🔥 IPV-Bench
we introduce ***IPV-Bench***, a novel benchmark designed to evaluate and foster progress in video understanding and generation.
<p align="center"> <img src="assets/main_fig.png" width="820px"></p>
1. **§IPV Taxonomy**: IPV-Bench is underpinned by a comprehensive taxonomy, encompassing 4 domains, 14 categories. It features diverse scenes that defy physical, biological, geographical, or social laws.
2. **§IPV-Txt Prompt Suite**: A prompt suite is constructed based on the taxonomy to evaluate video generation models, challenging their prompt following and creativity capabilities.
3. **§IPV-Vid Videos**: A video benchmark is curated to assess Video-LLMs on their ability of understanding impossible videos, which particularly requires reasoning on temporal dynamics and world knowledge.
## 🏆 Leaderboard
### Text-to-video Generation
<p align="center"> <img src="assets/ipv_eval_vid_gen.png" width="820px"></p>
### Video-LLM-based Video Understanding
<p align="center"> <img src="assets/ipv_eval_vid_understand.png" width="820px"></p>
## 🚀 Get Started
First, go to [Huggingface](https://huggingface.co/ShowLab) and download our data and code, including videos, task files, and example evaluation code.
The task files and examples files can also be found in this GitHub repo.
### Evaluate Impossible Video Generation
1. Use `example_read_prompt.py` to read the `ipv_txt_prompt_suite.json` file to get the text prompts.
2. Use the text prompt to generate videos using your models.
3. Annotate the `visual quality` and `prompt following` fields for each video.
4. Compute `IPV Score` by stating the percentage of videos that are *both of high quality and good prompt following.*
🛠️ **In this study, we employ human annotation to provide reliable insights for the models.
We are still polishing on an automatic evaluation framework, which will be open-sourced in the future.**
### Evaluate Impossible Video Understanding
1. The benchmark involves three tasks: Judgement, Multi-choice QA, and Open-ended QA.
2. Navigate to [example_eval/eval_judgement.py](example_eval/eval_judgement.py), [example_eval/eval_mcqa.py](example_eval/eval_mcqa.py), and [example_eval/eval_openqa.py](example_eval/eval_openqa.py) for each task.
3. The example code implements the full evaluation pipeline. To evaluate your model, you simply modify the `inference_one()` function to produce the output.
### Join Discussion
Welcome to discuss with us and continuously improve the quality of impossible videos.
Reach us with the WeChat QR code below!
<p align="center">
<img src="assets/wechat_qr.jpg" width="256">
</p>
## 🎓 BibTeX
If you find our work helpful, please kindly star this repo and consider citing our paper.
```
@misc{bai2025impossible,
title={Impossible Videos},
author={Zechen Bai and Hai Ci and Mike Zheng Shou},
year={2025},
eprint={2503.14378},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.14378},
}
```
<div align="center">
<h1>不可能视频(Impossible Videos)</h1>
[白泽辰(Zechen Bai)](https://www.baizechen.site/) <sup>*</sup>
[海赐(Hai Ci)](https://haici.cc/) <sup>*</sup>
[郑守明(Mike Zheng Shou)](https://sites.google.com/view/showlab) <sup></sup>
[新加坡国立大学Show Lab](https://sites.google.com/view/showlab/home?authuser=0)
[](https://huggingface.co/ShowLab)
[](https://arxiv.org/abs/2503.14378)
<p align="center">
<img src="assets/logo.jpg" alt="项目标识" style="display: block; margin: 0 auto;" width="600px" />
</p>
</div>
## 🤔 什么是不可能视频?
不可能视频指的是呈现**违背现实逻辑、与真实世界相悖**的场景的视频,这类场景在现实中根本不可能存在。请访问我们的[官方网站](https://showlab.github.io/Impossible-Videos/)查看更多示例。
## 💡 为何我们关注不可能视频?
不可能视频可作为检验高阶视频模型的试金石。作为***脱离真实世界数据分布***的样本,这类数据要求模型不能仅仅***记忆***真实世界数据并基于输入***检索***相似信息,而是要真正从真实数据中***学习***并基于输入进行***推理***。
本项目旨在通过解答以下关键问题推动视频研究的发展:
- 当前的视频生成模型能否有效遵循提示词(Prompt),**生成**不可能的视频内容?
- 当前的视频理解模型是否足以**理解**不可能视频?
## 🔥 IPV-Bench基准测试集
我们推出了***IPV-Bench***,这是一个专为评估与推动视频理解与生成领域发展而设计的全新基准测试集。
<p align="center"> <img src="assets/main_fig.png" width="820px"></p>
1. **§IPV分类体系**:IPV-Bench基于一套完整的分类体系构建,涵盖4大领域、14个细分类别,收录了大量违背物理、生物、地理或社会法则的多样化场景。
2. **§IPV文本提示词套件**:基于该分类体系构建了提示词套件,用于评估视频生成模型,考验模型遵循提示词的能力与创意水平。
3. **§IPV视频数据集**:整理了视频基准测试集,用于评估视频大语言模型(Video-LLMs)理解不可能视频的能力,这类任务尤其需要模型对时间动态与世界知识进行推理。
## 🏆 排行榜
### 文本到视频生成任务
<p align="center"> <img src="assets/ipv_eval_vid_gen.png" width="820px"></p>
### 基于视频大语言模型的视频理解任务
<p align="center"> <img src="assets/ipv_eval_vid_understand.png" width="820px"></p>
## 🚀 快速开始
首先,请访问[Hugging Face](https://huggingface.co/ShowLab)下载我们的数据集与代码,其中包含视频文件、任务文件以及示例评估代码。任务文件与示例文件也可在本GitHub仓库中获取。
### 评估不可能视频生成任务
1. 使用`example_read_prompt.py`读取`ipv_txt_prompt_suite.json`文件,获取文本提示词。
2. 使用您的模型基于这些提示词生成视频。
3. 为每个生成的视频标注`视觉质量`与`提示词遵循度`两项指标。
4. 计算`IPV得分`:即同时满足高质量与良好提示词遵循度的视频占总生成视频的比例。
🛠️ **本研究中,我们采用人工标注的方式为模型评估提供可靠的依据。目前我们仍在优化自动化评估框架,该框架将于未来开源。**
### 评估不可能视频理解任务
1. 该基准测试集包含三类任务:正误判断、多项选择问答与开放式问答。
2. 分别访问[example_eval/eval_judgement.py](example_eval/eval_judgement.py)、[example_eval/eval_mcqa.py](example_eval/eval_mcqa.py)与[example_eval/eval_openqa.py](example_eval/eval_openqa.py)获取各任务的示例代码。
3. 示例代码实现了完整的评估流程。若要评估您的模型,仅需修改`inference_one()`函数以生成模型输出即可。
### 加入交流
欢迎与我们交流探讨,共同提升不可能视频的研究质量。您可通过下方的微信二维码联系我们!
<p align="center">
<img src="assets/wechat_qr.jpg" width="256">
</p>
## 🎓 引用格式
如果您认为我们的工作对您有所帮助,请为本仓库点亮Star,并考虑引用我们的论文。
@misc{bai2025impossible,
title={Impossible Videos},
author={Zechen Bai and Hai Ci and Mike Zheng Shou},
year={2025},
eprint={2503.14378},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.14378},
}
提供机构:
maas
创建时间:
2025-05-29



