EditReward-Bench
收藏魔搭社区2026-05-16 更新2025-10-04 收录
下载链接:
https://modelscope.cn/datasets/EditScore/EditReward-Bench
下载链接
链接失效反馈官方服务:
资源简介:
<p align="center">
<img src="assets/logo.png" width="65%">
</p>
<p align="center">
<a href="https://vectorspacelab.github.io/EditScore"><img src="https://img.shields.io/badge/Project%20Page-EditScore-yellow" alt="project page"></a>
<a href="https://arxiv.org/abs/2509.23909"><img src="https://img.shields.io/badge/arXiv%20paper-2509.23909-b31b1b.svg" alt="arxiv"></a>
<a href="https://huggingface.co/collections/EditScore/editscore-68d8e27ee676981221db3cfe"><img src="https://img.shields.io/badge/EditScore-🤗-yellow" alt="model"></a>
<a href="https://huggingface.co/datasets/EditScore/EditReward-Bench"><img src="https://img.shields.io/badge/EditReward--Bench-🤗-yellow" alt="dataset"></a>
</p>
<h4 align="center">
<p>
<a href=#-news>News</a> |
<a href=#-quick-start>Quick Start</a> |
<a href=#-benchmark-your-image-editing-reward-model usage>Benchmark Usage</a> |
<a href=#%EF%B8%8F-citing-us>Citation</a>
<p>
</h4>
**EditScore** is a series of state-of-the-art open-source reward models (7B–72B) designed to evaluate and enhance instruction-guided image editing.
## ✨ Highlights
- **State-of-the-Art Performance**: Effectively matches the performance of leading proprietary VLMs. With a self-ensembling strategy, **our largest model surpasses even GPT-5** on our comprehensive benchmark, **EditReward-Bench**.
- **A Reliable Evaluation Standard**: We introduce **EditReward-Bench**, the first public benchmark specifically designed for evaluating reward models in image editing, featuring 13 subtasks, 11 state-of-the-art editing models (*including proprietary models*) and expert human annotations.
- **Simple and Easy-to-Use**: Get an accurate quality score for your image edits with just a few lines of code.
- **Versatile Applications**: Ready to use as a best-in-class reranker to improve editing outputs, or as a high-fidelity reward signal for **stable and effective Reinforcement Learning (RL) fine-tuning**.
## 🔥 News
- **2025-09-30**: We release **OmniGen2-EditScore7B**, unlocking online RL For Image Editing via high-fidelity EditScore. LoRA weights are available at [Hugging Face](https://huggingface.co/OmniGen2/OmniGen2-EditScore7B) and [ModelScope](https://www.modelscope.cn/models/OmniGen2/OmniGen2-EditScore7B).
- **2025-09-30**: We are excited to release **EditScore** and **EditReward-Bench**! Model weights and the benchmark dataset are now publicly available. You can access them on Hugging Face: [Models Collection](https://huggingface.co/collections/EditScore/editscore-68d8e27ee676981221db3cfe) and [Benchmark Dataset](https://huggingface.co/datasets/EditScore/EditReward-Bench), and on ModelScope: [Models Collection](https://www.modelscope.cn/collections/EditScore-8b0d53aa945d4e) and [Benchmark Dataset](https://www.modelscope.cn/datasets/EditScore/EditReward-Bench).
## 📖 Introduction
While Reinforcement Learning (RL) holds immense potential for this domain, its progress has been severely hindered by the absence of a high-fidelity, efficient reward signal.
To overcome this barrier, we provide a systematic, two-part solution:
- **A Rigorous Evaluation Standard**: We first introduce **EditReward-Bench**, a new public benchmark for the direct and reliable evaluation of reward models. It features 13 diverse subtasks and expert human annotations, establishing a gold standard for measuring reward signal quality.
- **A Powerful & Versatile Tool**: Guided by our benchmark, we developed the **EditScore** model series. Through meticulous data curation and an effective self-ensembling strategy, EditScore sets a new state of the art for open-source reward models, even surpassing the accuracy of leading proprietary VLMs.
<p align="center">
<img src="assets/table_reward_model_results.png" width="95%">
<br>
<em>Benchmark results on EditReward-Bench.</em>
</p>
We demonstrate the practical utility of EditScore through two key applications:
- **As a State-of-the-Art Reranker**: Use EditScore to perform Best-of-*N* selection and instantly improve the output quality of diverse editing models.
- **As a High-Fidelity Reward for RL**: Use EditScore as a robust reward signal to fine-tune models via RL, enabling stable training and unlocking significant performance gains where general-purpose VLMs fail.
This repository releases both the **EditScore** models and the **EditReward-Bench** dataset to facilitate future research in reward modeling, policy optimization, and AI-driven model improvement.
<p align="center">
<img src="assets/figure_edit_results.png" width="95%">
<br>
<em>EditScore as a superior reward signal for image editing.</em>
</p>
## 📌 TODO
We are actively working on improving EditScore and expanding its capabilities. Here's what's next:
- [ ] Release RL training code applying EditScore to OmniGen2.
- [ ] Provide Best-of-N inference scripts for OmniGen2, Flux-dev-Kontext, and Qwen-Image-Edit.
## 🚀 Quick Start
### 🛠️ Environment Setup
#### ✅ Recommended Setup
```bash
# 1. Clone the repo
git clone git@github.com:VectorSpaceLab/EditScore.git
cd EditScore
# 2. (Optional) Create a clean Python environment
conda create -n editscore python=3.12
conda activate editscore
# 3. Install dependencies
# 3.1 Install PyTorch (choose correct CUDA version)
pip install torch==2.7.1 torchvision --extra-index-url https://download.pytorch.org/whl/cu126
# 3.2 Install other required packages
pip install -r requirements.txt
# EditScore runs even without vllm, though we recommend install it for best performance.
pip install vllm
```
#### 🌏 For users in Mainland China
```bash
# Install PyTorch from a domestic mirror
pip install torch==2.7.1 torchvision --index-url https://mirror.sjtu.edu.cn/pytorch-wheels/cu126
# Install other dependencies from Tsinghua mirror
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# EditScore runs even without vllm, though we recommend install it for best performance.
pip install vllm -i https://pypi.tuna.tsinghua.edu.cn/simple
```
---
### 🧪 Usage Example
Using EditScore is straightforward. The model will be automatically downloaded from the Hugging Face Hub on its first run.
```python
from PIL import Image
from editscore import EditScore
# Load the EditScore model. It will be downloaded automatically.
# Replace with the specific model version you want to use.
model_path = "Qwen/Qwen2.5-VL-7B-Instruct"
lora_path = "EditScore/EditScore-7B"
scorer = EditScore(
backbone="qwen25vl", # set to "qwen25vl_vllm" for faster inference
model_name_or_path=model_path,
enable_lora=True,
lora_path=lora_path,
score_range=25,
num_pass=1, # Increase for better performance via self-ensembling
)
input_image = Image.open("example_images/input.png")
output_image = Image.open("example_images/output.png")
instruction = "Adjust the background to a glass wall."
result = scorer.evaluate([input_image, output_image], instruction)
print(f"Edit Score: {result['final_score']}")
# Expected output: A dictionary containing the final score and other details.
```
---
## 📊 Benchmark Your Image-Editing Reward Model
We provide an evaluation script to benchmark reward models on **EditReward-Bench**. To evaluate your own custom reward model, simply create a scorer class with a similar interface and update the script.
```bash
# This script will evaluate the default EditScore model on the benchmark
bash evaluate.sh
# Or speed up inference with VLLM
bash evaluate_vllm.sh
```
## ❤️ Citing Us
If you find this repository or our work useful, please consider giving a star ⭐ and citation 🦖, which would be greatly appreciated:
```bibtex
@article{luo2025editscore,
title={EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling},
author={Xin Luo and Jiahao Wang and Chenyuan Wu and Shitao Xiao and Xiyan Jiang and Defu Lian and Jiajun Zhang and Dong Liu and Zheng Liu},
journal={arXiv preprint arXiv:2509.23909},
year={2025}
}
```
<p align="center">
<img src="assets/logo.png" width="65%">
</p>
<p align="center">
<a href="https://vectorspacelab.github.io/EditScore"><img src="https://img.shields.io/badge/Project%20Page-EditScore-yellow" alt="项目主页"></a>
<a href="https://arxiv.org/abs/2509.23909"><img src="https://img.shields.io/badge/arXiv%20paper-2509.23909-b31b1b.svg" alt="arXiv论文"></a>
<a href="https://huggingface.co/collections/EditScore/editscore-68d8e27ee676981221db3cfe"><img src="https://img.shields.io/badge/EditScore-🤗-yellow" alt="模型"></a>
<a href="https://huggingface.co/datasets/EditScore/EditReward-Bench"><img src="https://img.shields.io/badge/EditReward--Bench-🤗-yellow" alt="数据集"></a>
</p>
<h4 align="center">
<p>
<a href="#-最新动态">最新动态</a> |
<a href="#-快速上手">快速上手</a> |
<a href="#-基准测试你的图像编辑奖励模型">基准测试使用</a> |
<a href="#-引用本文">引用本文</a>
</p>
</h4>
**EditScore** 是一系列前沿开源奖励模型(7B–72B参数),旨在评估并优化指令引导的图像编辑。
## ✨ 核心亮点
- **最先进性能表现**:可媲美头部闭源视觉语言模型(Visual Language Model, VLM)。通过自集成策略,我们的最大尺寸模型甚至在综合基准测试集**EditReward-Bench**上超越了GPT-5。
- **可靠的评估标准**:我们推出了**EditReward-Bench**,这是首个专为图像编辑领域奖励模型评估设计的公开基准测试集,涵盖13个子任务、11款最先进的编辑模型(包含闭源模型)以及专家人工标注。
- **简洁易用**:仅需数行代码即可为你的图像编辑结果获取精准的质量评分。
- **多场景适配**:可作为顶尖的重排序器优化编辑输出,也可作为高保真奖励信号用于**强化学习(Reinforcement Learning, RL)** 微调,实现稳定且高效的训练。
## 🔥 最新动态
- **2025-09-30**:我们发布了**OmniGen2-EditScore7B**,通过高保真EditScore解锁图像编辑的在线强化学习。其低秩自适应(Low-Rank Adaptation, LoRA)权重可在[Hugging Face](https://huggingface.co/OmniGen2/OmniGen2-EditScore7B)和[ModelScope](https://www.modelscope.cn/models/OmniGen2/OmniGen2-EditScore7B)获取。
- **2025-09-30**:我们正式发布**EditScore**与**EditReward-Bench**!模型权重与基准数据集现已公开可用,可通过以下平台获取:Hugging Face的[模型合集](https://huggingface.co/collections/EditScore/editscore-68d8e27ee676981221db3cfe)与[基准数据集](https://huggingface.co/datasets/EditScore/EditReward-Bench),以及ModelScope的[模型合集](https://www.modelscope.cn/collections/EditScore-8b0d53aa945d4e)与[基准数据集](https://www.modelscope.cn/datasets/EditScore/EditReward-Bench)。
## 📖 项目简介
强化学习(Reinforcement Learning, RL)在该领域拥有巨大潜力,但其进展却因缺乏高保真、高效的奖励信号而受到严重阻碍。
为突破这一瓶颈,我们提供了一套系统性的两阶段解决方案:
- **严谨的评估标准**:我们首先推出**EditReward-Bench**,一款用于直接且可靠评估奖励模型的全新公开基准测试集。该基准涵盖13个多样化子任务与专家人工标注,为衡量奖励信号质量确立了黄金标准。
- **强大且通用的工具**:基于该基准的指导,我们开发了**EditScore**模型系列。通过精心的数据筛选与高效的自集成策略,EditScore将开源奖励模型的性能推至新高度,其精度甚至超越了领先的闭源视觉语言模型。
<p align="center">
<img src="assets/table_reward_model_results.png" width="95%">
<br>
<em>EditReward-Bench上的基准测试结果。</em>
</p>
我们通过两个关键应用场景展示了EditScore的实用价值:
- **作为顶尖重排序器**:使用EditScore进行N选最优选择,可快速提升各类编辑模型的输出质量。
- **作为强化学习的高保真奖励信号**:使用EditScore作为稳健的奖励信号,通过强化学习对模型进行微调,实现稳定训练并解锁显著的性能提升,而通用视觉语言模型在此类任务中往往难以奏效。
本仓库同时发布**EditScore**模型与**EditReward-Bench**数据集,以推动奖励建模、策略优化以及AI驱动的模型改进领域的未来研究。
<p align="center">
<img src="assets/figure_edit_results.png" width="95%">
<br>
<em>EditScore作为图像编辑的优质奖励信号。</em>
</p>
## 📌 后续计划
我们正在积极优化EditScore并拓展其功能,后续计划包括:
- [ ] 发布将EditScore应用于OmniGen2的强化学习训练代码。
- [ ] 为OmniGen2、Flux-dev-Kontext以及Qwen-Image-Edit提供N选最优推理脚本。
## 🚀 快速上手
### 🛠️ 环境搭建
#### ✅ 标准安装流程
bash
# 1. 克隆仓库
git clone git@github.com:VectorSpaceLab/EditScore.git
cd EditScore
# 2.(可选)创建干净的Python环境
conda create -n editscore python=3.12
conda activate editscore
# 3. 安装依赖
# 3.1 安装PyTorch(请匹配对应CUDA版本)
pip install torch==2.7.1 torchvision --extra-index-url https://download.pytorch.org/whl/cu126
# 3.2 安装其他依赖包
pip install -r requirements.txt
# 即使不安装vllm,EditScore也可正常运行,但推荐安装以获得最佳性能。
pip install vllm
#### 🌏 中国大陆用户加速方案
bash
# 从国内镜像安装PyTorch
pip install torch==2.7.1 torchvision --index-url https://mirror.sjtu.edu.cn/pytorch-wheels/cu126
# 从清华镜像安装其他依赖
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# 即使不安装vllm,EditScore也可正常运行,但推荐安装以获得最佳性能。
pip install vllm -i https://pypi.tuna.tsinghua.edu.cn/simple
---
### 🧪 代码示例
使用EditScore非常简便。模型将在首次运行时自动从Hugging Face Hub下载。
python
from PIL import Image
from editscore import EditScore
# 加载EditScore模型,首次运行时将自动下载。
# 替换为你想要使用的具体模型版本。
model_path = "Qwen/Qwen2.5-VL-7B-Instruct"
lora_path = "EditScore/EditScore-7B"
scorer = EditScore(
backbone="qwen25vl", # 若需更快推理,可设置为"qwen25vl_vllm"
model_name_or_path=model_path,
enable_lora=True,
lora_path=lora_path,
score_range=25,
num_pass=1, # 增加该参数可通过自集成策略提升性能
)
input_image = Image.open("example_images/input.png")
output_image = Image.open("example_images/output.png")
instruction = "将背景调整为玻璃幕墙。"
result = scorer.evaluate([input_image, output_image], instruction)
print(f"编辑评分:{result['final_score']}")
# 预期输出:包含最终评分及其他细节的字典。
---
## 📊 基准测试你的图像编辑奖励模型
我们提供了一套评估脚本,用于在**EditReward-Bench**上对奖励模型进行基准测试。若要评估自定义的奖励模型,仅需创建一个具有相似接口的评分器类,并更新该脚本即可。
bash
# 该脚本将在基准测试集上评估默认的EditScore模型
bash evaluate.sh
# 若需使用VLLM加速推理,可执行:
bash evaluate_vllm.sh
## ❤️ 引用本文
若本仓库或您的工作对您有所帮助,欢迎点亮⭐并引用我们的论文,感谢您的支持:
bibtex
@article{luo2025editscore,
title={EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling},
author={Xin Luo and Jiahao Wang and Chenyuan Wu and Shitao Xiao and Xiyan Jiang and Defu Lian and Jiajun Zhang and Dong Liu and Zheng Liu},
journal={arXiv preprint arXiv:2509.23909},
year={2025}
}
提供机构:
maas
创建时间:
2025-09-29
搜集汇总
数据集介绍

背景与挑战
背景概述
EditReward-Bench是一个专门用于评估图像编辑奖励模型的公共基准数据集,包含13个子任务、11个先进编辑模型和专家人工标注,旨在为奖励模型提供可靠的评估标准。该数据集支持图像编辑质量的评估和优化,与EditScore模型系列结合使用,以提升编辑输出和强化学习训练效果。
以上内容由遇见数据集搜集并总结生成



