FreedomIntelligence/MedBananaBench

Name: FreedomIntelligence/MedBananaBench
Creator: FreedomIntelligence
Published: 2026-02-27 07:19:54
License: 暂无描述

Hugging Face2026-02-27 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/FreedomIntelligence/MedBananaBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - T1I - Text-to-Image size_categories: - n<1K --- <img src="assets/benchmark.png" /> MedBananaBench consists of **296 medical illustration generation tasks** spanning five categories. The benchmark is designed to reflect real-world medical illustration generation scenarios. In total, MedBananaBench contains **9,015 unique rubric criteria**, enabling fine-grained evaluation of medical illustration generation across three dimensions. ### Evaluation 1. You can [⬇️download our full MedBananaBench](https://huggingface.co/datasets/FreedomIntelligence/MedBananaBench) from HuggingFace. 2. Follow the commands below for evaluation. ```bash git clone https://github.com/FreedomIntelligence/MedBanana.git cd eval pip install -r requiremnets.txt # Illustrations generated by the Rubric evaluation python eval_rubric.py --json_file MedBananaBench/medbananabench.json \ --ori_folder MedBananaBench/medbananabench \ --gen_folder outputs/janus-pro-7b \ --model_name janus-pro-7b # Calculate the MedBananaBench Score python cal_score.py ``` ### Results We evaluate recent text-to-image generation models on MedBananaBench. Overall, commercial models consistently outperform open-source ones across all tasks: Gemini-3-Pro-Image achieves the highest average score of 0.873. <table> <thead> <tr> <th>Model</th> <th>Parameters</th> <th>Scientific Accuracy↑</th> <th>Structural Correctness↑</th> <th>Semantic Alignment↑</th> <th style="background-color: #DEF;">Average↑</th> </tr> </thead> <tbody> <tr style="background-color: #f0f0f0; color: #666;"> <td colspan="6">Commercial T2I Generation Models (Reference Only)</td> </tr> <tr style="background-color: #f0f0f0; color: #666;"> <td>GPT-Image-1</td> <td>×</td> <td>0.843</td> <td>0.812</td> <td>0.847</td> <td>0.835</td> </tr> <tr style="background-color: #f0f0f0; color: #666;"> <td>GPT-Image-1.5</td> <td>×</td> <td>0.849</td> <td>0.811</td> <td>0.852</td> <td>0.838</td> </tr> <tr style="background-color: #f0f0f0; color: #666;"> <td>Gemini-2.5-Flash-Image</td> <td>×</td> <td>0.733</td> <td>0.676</td> <td>0.789</td> <td>0.734</td> </tr> <tr style="background-color: #f0f0f0; color: #666;"> <td>Gemini-3-Pro-Image</td> <td>×</td> <td>0.879</td> <td>0.849</td> <td>0.890</td> <td>0.873</td> </tr> <tr style="background-color: #f0f0f0; color: #666;"> <td>Gemini-3.1-Flash-Image</td> <td>×</td> <td>0.869</td> <td>0.852</td> <td>0.886</td> <td>0.870</td> </tr> <tr style="background-color: #f0f0f0; color: #666;"> <td>Seedream-4.5</td> <td>×</td> <td>0.787</td> <td>0.692</td> <td>0.825</td> <td>0.769</td> </tr> <tr style="background-color: #f0f0f0; color: #666;"> <td>Kling-Image-v2.1</td> <td>×</td> <td>0.173</td> <td>0.129</td> <td>0.272</td> <td>0.190</td> </tr> <tr> <td colspan="6">Open-Source T2I Generation Models</td> </tr> <tr> <td>SDXL</td> <td>3.5B</td> <td>0.103</td> <td>0.061</td> <td>0.170</td> <td>0.111</td> </tr> <tr> <td>Playground-v2.5</td> <td>3.5B</td> <td>0.063</td> <td>0.043</td> <td>0.147</td> <td>0.083</td> </tr> <tr> <td>FLUX.1-dev</td> <td>12B</td> <td>0.375</td> <td>0.324</td> <td>0.476</td> <td>0.391</td> </tr> <tr> <td>Stable-Diffusion-3.5</td> <td>8.1B</td> <td>0.220</td> <td>0.152</td> <td>0.267</td> <td>0.213</td> </tr> <tr> <td>Chroma1-HD</td> <td>8.9B</td> <td>0.417</td> <td>0.332</td> <td>0.506</td> <td>0.419</td> </tr> <tr> <td>HiDream-I1-Full</td> <td>17B</td> <td>0.247</td> <td>0.212</td> <td>0.311</td> <td>0.256</td> </tr> <tr> <td>Lumina-Image-2.0</td> <td>2.6B</td> <td>0.308</td> <td>0.239</td> <td>0.404</td> <td>0.317</td> </tr> <tr> <td>Qwen-Image</td> <td>20B</td> <td>0.434</td> <td>0.344</td> <td>0.517</td> <td>0.432</td> </tr> <tr> <td>Qwen-Image-2512</td> <td>20B</td> <td>0.644</td> <td>0.565</td> <td>0.590</td> <td>0.601</td> </tr> <tr> <td colspan="6">Unified Understanding and Generation Models</td> </tr> <tr> <td>Janus-Pro-1B</td> <td>1B</td> <td>0.174</td> <td>0.110</td> <td>0.370</td> <td>0.217</td> </tr> <tr> <td>Janus-Pro-7B</td> <td>7B</td> <td>0.298</td> <td>0.224</td> <td>0.463</td> <td>0.328</td> </tr> <tr> <td>Janus-4o</td> <td>7B</td> <td>0.416</td> <td>0.318</td> <td>0.566</td> <td>0.433</td> </tr> <tr> <td>BAGEL</td> <td>14B (A7B)</td> <td>0.350</td> <td>0.301</td> <td>0.521</td> <td>0.390</td> </tr> <tr> <td>BLIP3o-NEXT</td> <td>3B</td> <td>0.319</td> <td>0.266</td> <td>0.445</td> <td>0.343</td> </tr> <tr> <td>UniWorld-V1</td> <td>19B</td> <td>0.265</td> <td>0.202</td> <td>0.416</td> <td>0.294</td> </tr> <tr> <td>Emu3.5</td> <td>8B</td> <td>0.306</td> <td>0.257</td> <td>0.470</td> <td>0.344</td> </tr> <tr> <td>Show-o2</td> <td>7B</td> <td>0.244</td> <td>0.203</td> <td>0.435</td> <td>0.273</td> </tr> <tr> <td>GLM-Image</td> <td>16B</td> <td>0.492</td> <td>0.430</td> <td>0.552</td> <td>0.491</td> </tr> <tr> <td colspan="6">T2I Reasoning Models</td> </tr> <tr> <td>GoT</td> <td>6B</td> <td>0.287</td> <td>0.196</td> <td>0.319</td> <td>0.262</td> </tr> <tr> <td>Janus-Pro-R1</td> <td>7B</td> <td>0.014</td> <td>0.008</td> <td>0.135</td> <td>0.052</td> </tr> <tr> <td>Uni-CoT (v0.2)</td> <td>14B (A7B)</td> <td>0.384</td> <td>0.321</td> <td>0.506</td> <td>0.413</td> </tr> <tr> <td>T2I-R1</td> <td>7B</td> <td>0.258</td> <td>0.186</td> <td>0.424</td> <td>0.289</td> </tr> <tr> <td colspan="6">Our Models</td> </tr> <tr style="background-color: #DEF;"> <td>MedBanana</td> <td>7B</td> <td>0.606</td> <td>0.537</td> <td>0.711</td> <td>0.618</td> </tr> </tbody> </table> <img src="assets/case.png" /> ## Our Series of Works Explore our other works: - [MedGen](https://github.com/FreedomIntelligence/MedGen): a specialized video generation model designed to revolutionize clinical training and surgical simulation by producing medically accurate, high-fidelity visual content that bridges the gap between theoretical education and real-world professional practice. - [MicroVerse](https://github.com/FreedomIntelligence/MicroVerse): a model tailored for microscale simulation, enabling the accurate visualization of cellular and molecular processes to support drug discovery, biomedical research, and interactive scientific education. ## Citation If you find this repository helpful, please consider citing: ```bibtex ```

提供机构：

FreedomIntelligence

5,000+

优质数据集

54 个

任务类型

进入经典数据集