Dreamer-V1-Data
收藏魔搭社区2025-12-05 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/osunlp/Dreamer-V1-Data
下载链接
链接失效反馈官方服务:
资源简介:
After heavier cleaning, the remaining data size is 3.12M.
# WebDreamer: Model-Based Planning for Web Agents
WebDreamer is a planning framework that enables efficient and effective planning for real-world web agent tasks. Check our paper for more details.
This work is a collaboration between [OSUNLP](https://x.com/osunlp) and [Orby AI](https://www.orby.ai/).

- **Repository:** https://github.com/OSU-NLP-Group/WebDreamer
- **Paper:** https://arxiv.org/abs/2411.06559
- **Point of Contact:** [Kai Zhang](mailto:zhang.13253@osu.edu)
## Models
- Dreamer-7B:
- [General](https://huggingface.co/osunlp/Dreamer-7B)
- [In-Domain-VWA-Shopping](https://huggingface.co/osunlp/Dreamer-7B-Shopping)
- [In-Domain-VWA-Classifieds](https://huggingface.co/osunlp/Dreamer-7B-Classifieds)
- [In-Domain-VWA-Reddit](https://huggingface.co/osunlp/Dreamer-7B-Reddit)
## Data:
[Dreamer Training Data](https://huggingface.co/datasets/osunlp/Dreamer-V1-Data)
```
root
|-- prompt: string
|-- image: binary
|-- response: string
|-- action: string
```
## Results
### Strong performance on VisualWebArena and Mind2Web-live
| Benchmark | Method | Success Rate |
|------------------|-----------------|--------------------|
| **VisualWebArena** | GPT-4o + Reactive | 17.6% |
| | GPT-4o + Tree Search | 26.2% |
| | **GPT-4o + WebDreamer** | 23.6% (↑34.1%) |
| **Online-Mind2Web** | GPT-4o + Reactive | 26.0% |
| | **GPT-4o + WebDreamer** | 37.0% (↑42.3%) |
| **Mind2Web-live** | GPT-4o + Reactive | 20.2% |
| | **GPT-4o + WebDreamer** | 25.0% (↑23.8%) |
Compared to the reactive baselines, WebDreamer significantly improves performance by 34.1%, 42.3%, and 23.8% on VisualWebArena, Online-Mind2Web, and Mind2Web-live, respectively.
### Better efficiency than tree search with true interactions
<img width="1502" alt="image" src="https://github.com/user-attachments/assets/0afbc22d-b1eb-4026-a167-e1852cde7677">
WebDreamer effectively explores the search space through simulations, which largely reduces the reliance on real-world interactions while maintaining robust performance.
## Inference
### vLLM server
```bash
vllm serve osunlp/Dreamer-7B --api-key token-abc123 --dtype float16
```
or
```bash
python -m vllm.entrypoints.openai.api_server --served-model-name osunlp/Dreamer-7B --model osunlp/Dreamer-7B --dtype float16
```
You can find more instruction about training and inference in [Qwen2-VL's Official Repo](https://github.com/QwenLM/Qwen2-VL).
### Prompt
Actually our model is quite robust to textual prompt so feel free to try various prompts which we didn't heavily explore.
```python
def format_openai_template(description: str, base64_image):
return [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
},
{
"type": "text",
"text": f"""
Below is current screenshot. Please describe what you would see after a {action_description}"""
},
],
},
]
messages = format_openai_template(description, base64_image)
completion = await client.chat.completions.create(
model=args.model_path,
messages=messages,
temperature=1.0
)
```
## Citation Information
If you find this work useful, please consider citing our papers:
```
@article{Gu2024WebDreamer,
author = {Yu Gu and Kai Zhang and Yuting Ning and Boyuan Zheng and Boyu Gou and Tianci Xue and Cheng Chang and Sanjari Srivastava and Yanan Xie and Peng Qi and Huan Sun and Yu Su},
title = {Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents},
journal = {CoRR},
volume = {abs/2411.06559},
year = {2024},
url = {https://arxiv.org/abs/2411.06559},
eprinttype= {arXiv},
eprint = {2411.06559},
}
```
经过深度清洗后,剩余数据规模为3.12M。
# WebDreamer:面向Web智能体(Web Agent)的基于模型规划框架
WebDreamer是一款面向真实世界Web智能体(Web Agent)任务的高效且高性能规划框架,更多细节可参阅我们的学术论文。
本研究由[OSUNLP](https://x.com/osunlp)与[Orby AI](https://www.orby.ai/)联合完成。

- **代码仓库:** https://github.com/OSU-NLP-Group/WebDreamer
- **学术论文:** https://arxiv.org/abs/2411.06559
- **联系方式:** [Kai Zhang](mailto:zhang.13253@osu.edu)
## 模型
- Dreamer-7B:
- **通用版:** [osunlp/Dreamer-7B](https://huggingface.co/osunlp/Dreamer-7B)
- **域内适配-VWA电商场景:** [osunlp/Dreamer-7B-Shopping](https://huggingface.co/osunlp/Dreamer-7B-Shopping)
- **域内适配-VWA分类广告场景:** [osunlp/Dreamer-7B-Classifieds](https://huggingface.co/osunlp/Dreamer-7B-Classifieds)
- **域内适配-VWA Reddit场景:** [osunlp/Dreamer-7B-Reddit](https://huggingface.co/osunlp/Dreamer-7B-Reddit)
## 数据集
[Dreamer训练数据集](https://huggingface.co/datasets/osunlp/Dreamer-V1-Data)
root
|-- prompt: string
|-- image: binary
|-- response: string
|-- action: string
## 实验结果
### 在VisualWebArena与Mind2Web-live基准上展现强劲性能
| 基准测试 | 方法 | 成功率 |
|------------------|-----------------|--------------------|
| **VisualWebArena** | GPT-4o + 反应式基线 | 17.6% |
| | GPT-4o + 树搜索 | 26.2% |
| | **GPT-4o + WebDreamer** | 23.6% (↑34.1%) |
| **Online-Mind2Web** | GPT-4o + 反应式基线 | 26.0% |
| | **GPT-4o + WebDreamer** | 37.0% (↑42.3%) |
| **Mind2Web-live** | GPT-4o + 反应式基线 | 20.2% |
| | **GPT-4o + WebDreamer** | 25.0% (↑23.8%) |
相较于反应式基线方法,WebDreamer在VisualWebArena、Online-Mind2Web与Mind2Web-live三个基准上分别实现了34.1%、42.3%与23.8%的性能提升。
### 相较于真实交互树搜索,具备更高运行效率
<img width="1502" alt="image" src="https://github.com/user-attachments/assets/0afbc22d-b1eb-4026-a167-e1852cde7677">
WebDreamer通过模拟有效探索搜索空间,在保持稳定性能的同时,大幅降低了对真实世界交互的依赖。
## 推理部署
### vLLM服务部署
bash
vllm serve osunlp/Dreamer-7B --api-key token-abc123 --dtype float16
或
bash
python -m vllm.entrypoints.openai.api_server --served-model-name osunlp/Dreamer-7B --model osunlp/Dreamer-7B --dtype float16
有关训练与推理的更多操作指南,可参阅[Qwen2-VL官方仓库](https://github.com/QwenLM/Qwen2-VL)。
### 提示词设计
事实上,本模型对文本提示词具备较强的鲁棒性,您可自由尝试未在本工作中深度探索的各类提示词。
python
def format_openai_template(description: str, base64_image):
return [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
},
{
"type": "text",
"text": f"""
Below is current screenshot. Please describe what you would see after a {action_description}"""
},
],
},
]
messages = format_openai_template(description, base64_image)
completion = await client.chat.completions.create(
model=args.model_path,
messages=messages,
temperature=1.0
)
## 引用信息
若您的工作用到了本研究内容,请引用我们的学术论文:
@article{Gu2024WebDreamer,
author = {Yu Gu and Kai Zhang and Yuting Ning and Boyuan Zheng and Boyu Gou and Tianci Xue and Cheng Chang and Sanjari Srivastava and Yanan Xie and Peng Qi and Huan Sun and Yu Su},
title = {Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents},
journal = {CoRR},
volume = {abs/2411.06559},
year = {2024},
url = {https://arxiv.org/abs/2411.06559},
eprinttype= {arXiv},
eprint = {2411.06559},
}
提供机构:
maas
创建时间:
2025-07-04



