five

odl-raiser/Envision

收藏
Hugging Face2025-12-02 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/odl-raiser/Envision
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-to-image language: - en tags: - unified-multimodal-model - T2I size_categories: - 1K<n<10K --- # Envision ## Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights ![b14d6800c97e2dc34ac8e703d8b89802](https://cdn-uploads.huggingface.co/production/uploads/670880950e79a8b46f7ff9dd/UURtu-VzAPGWvCbo3r5G4.png) Envision is a comprehensive benchmark designed for evaluating the unified understanding and sequential generation capabilities of multimodal models, specifically focusing on the modeling of **causal world processes**. The benchmark assesses a model's ability to generate coherent, physically plausible, and aesthetically pleasing sequences of images that follow a complex, step-by-step causal narrative. ![dbe747ff18433287f4b68067d92cf530](https://cdn-uploads.huggingface.co/production/uploads/670880950e79a8b46f7ff9dd/K6dgVvZi_FwuY7GB1dTgY.png) --- ### 1. Directory Structure Overview The repository's data is centrally located within the `data/` directory. This directory contains six specialized JSON files, which collectively form the complete dataset of continuous, four-stage event progressions. Each file is named to clearly indicate its thematic domain. | Filename | Category | Process Type | Description | | :--- | :--- | :--- | :--- | | `data/envision_bio.json` | Science | Biology | Sequences covering ecological, evolutionary, and life-science phenomena (e.g., primary succession, speciation). | | `data/envision_chem.json` | Science | Chemistry | Sequences detailing chemical reactions and fundamental processes (e.g., thermite reaction, precipitation, organic synthesis mechanisms). | | `data/envision_phy.json` | Science | Physics | Sequences illustrating core physical principles and dynamics (e.g., conservation of momentum, electromagnetism, wave phenomena). | | `data/envision_geo.json` | Science | Geography | Sequences focusing on geomorphological and Earth surface processes (e.g., coastal erosion, alluvial fan formation, glacial dynamics). | | `data/envision_mete.json` | Science | Meteorology | Sequences describing atmospheric and weather phenomena (e.g., tropical cyclone development, thunderstorm formation, frontal systems). | | `data/envision_cul.json` | Culture | History | Sequences documenting major historical events and long-term cultural transformations (e.g., the Industrial Revolution, the French Revolution, technological adoption). | #### Data Format (`.json` Files) Each JSON file is structured as a list of independent event progression objects. Every object adheres to a consistent schema designed to capture both the visual state and the underlying causal mechanisms of the process: 1. **`index`**: A unique numerical identifier for the progression sequence within its domain. 2. **`category`**: The overarching domain of the progression, either 'Science' or 'Culture'. 3. **`process_type`**: A sub-category specifying the academic discipline (e.g., 'Biology', 'Chemistry', 'History'). 4. **`prompts`**: A list of exactly four dictionaries, representing the continuous four-stage progression. Each stage dictionary contains: * **`step`**: The sequence number (1 through 4). * **`prompt`**: A highly detailed, descriptive textual prompt designed to generate a single, specific visual frame of the event at that stage. * **`explanation`**: A concise academic explanation detailing the causal transition, physical law, or mechanism connecting the current stage to the previous state. --- ### 2. Data Download and Directory Setup To access the complete Envision dataset, which includes all six domain-specific JSON files, please use the standard `git clone` command on the dataset repository hosted on Hugging Face. #### Data Download Execute the following command in your terminal to clone the repository: ```bash git clone [https://huggingface.co/datasets/opendatalab-raiser/Envision](https://huggingface.co/datasets/opendatalab-raiser/Envision) ``` ----- ### 3\. 📐 Evaluation Protocol The evaluation of generated sequential images is conducted using the `eval.py` script, which automates quality assessment via a powerful VLM serving as a strict quality auditor. This process adheres to a rigorous, fine-grained, hierarchical scoring protocol over nine metrics on a 0-5 scale. #### Hierarchical Scoring and Weights The **Envision (Overall) Score** is a weighted average of three primary dimensions, with weights set to prioritize physical and causal coherence (4:4:2 ratio). | Dimension | Primary Weight *W* | Sub-Dimensions | | :--- | :--- | :--- | | **Consistency** | 40% (0.4) | Semantic Consistency, Factual Consistency, Spatial-Temporal Consistency | | **Physicality** | 40% (0.4) | Basic Properties, Dynamics and Interactivity, Physical Reliability | | **Aesthetic** | 20% (0.2) | Expressiveness, Artistic Quality, Authenticity | The final **Envision (Overall) Score** is computed as: $$\text{Overall Score} = \sum_{D \in \{\text{Cons, Phys, Aes}\}} \mathbf{W}_D \times \text{MeanScore}_D$$ Where **MeanScore** is the weighted average of the three sub-dimensions within dimension *D*. The sub-dimensions are weighted approximately equally (0.33, 0.33, 0.34) within their parent dimension. #### Running the Evaluation The `eval.py` script requires the user to provide the generated images corresponding to the sequence prompts and an OpenAI API key to initiate the LLM-based assessment. ```bash python eval.py \ --json_path /path/to/your/sequences.json \ --image_dir /path/to/your/generated/images \ --output_dir /path/to/save/results \ --api_key YOUR_OPENAI_API_KEY \ --model gpt-4o \ --result_full full_results.json \ --result_scores scores.jsonl \ --max_workers 5 ``` | Argument | Description | | :--- | :--- | | `--json_path` | Path to the input JSON file containing sequence prompts. | | `--image_dir` | Root directory containing the index folders with step images generated by the model. | | `--output_dir` | Directory to save the full evaluation results (`.json`) and scores (`.jsonl`). | | `--api_key` | OpenAI API key required for the evaluation model. | | `--model` | The specific LLM model used for scoring (e.g., `gpt-4o`). | | `--result_full` | Output JSON file containing the full evaluation text and scores. | | `--result_scores` | Output JSONL file containing simplified scores for analysis. | | `--max_workers` | Maximum number of concurrent workers for parallel API calls. | ----- ### 4\. 🏆 Leaderboard For the latest official results and model rankings on the Envision benchmark, please visit our dedicated leaderboard website: **[https://opendatalab-raiser.github.io/Envision/](https://opendatalab-raiser.github.io/Envision/)** ----- ### 5\. ✍️ Citation If you use the Envision dataset or benchmark in your research, please cite the following paper: ```bibtex @article{wei2025ggbench, title={Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights}, author={Tian, Juanxi and Li, Siyuan and He, Conghui and Wu, Lijun and Tan, Cheng}, journal={arXiv preprint arXiv:2512.01816}, year={2025} } ``` ``` ```
提供机构:
odl-raiser
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作