easyr1-10k-hard-qwen7b-easy-gta1-4MP-synthetic-prompts-qwen25vl7binstruct
收藏魔搭社区2025-10-22 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP-synthetic-prompts-qwen25vl7binstruct
下载链接
链接失效反馈官方服务:
资源简介:
# easyr1-10k-hard-qwen7b-easy-gta1-4MP-synthetic-prompts-qwen25vl7binstruct
This dataset is derived from [mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP](https://huggingface.co/datasets/mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP)
with synthetic prompts generated using Qwen/Qwen2.5-VL-7B-Instruct.
## Generation Details
- **Generated on**: 2025-08-21 14:56:03 UTC
- **Source dataset**: mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP
- **Split processed**: train
- **Model used**: Qwen/Qwen2.5-VL-7B-Instruct
- **Total entries**: 10000
## Synthetic Prompt Generation
Each entry in the original dataset has been processed to generate synthetic prompts that represent
possible user tasks. The synthetic prompts are generated based on:
1. The UI element that was clicked
2. The context visible in the screenshot
3. Common user intentions for similar UI interactions
## New Fields Added
- `synthetic_prompts`: List of generated synthetic prompts (typically 3)
- `original_prompt`: The original user instruction before synthesis
- `selected_synthetic_prompt`: The synthetic prompt selected for the updated messages
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP-synthetic-prompts-qwen25vl7binstruct")
# Access the data
sample = dataset['train'][0]
# Original prompt
print(sample['original_prompt'])
# All generated synthetic prompts
print(sample['synthetic_prompts'])
# The selected synthetic prompt used in messages
print(sample['selected_synthetic_prompt'])
# Updated messages with synthetic prompt
print(sample['messages'])
```
## License
Inherits the license from the original dataset: mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP
# easyr1-10k-hard-qwen7b-easy-gta1-4MP-synthetic-prompts-qwen25vl7binstruct
本数据集衍生自 [mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP](https://huggingface.co/datasets/mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP),其合成提示词由 Qwen/Qwen2.5-VL-7B-Instruct 生成。
## 生成详情
- **生成时间**:2025-08-21 14:56:03 UTC
- **源数据集**:mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP
- **处理拆分集**:训练集(train)
- **使用模型**:Qwen/Qwen2.5-VL-7B-Instruct
- **总条目数**:10000
## 合成提示词生成
本数据集的原始条目均经过处理,以生成可表征实际用户任务的合成提示词。合成提示词的生成依据如下:
1. 被点击的UI元素
2. 截图中可见的上下文信息
3. 同类UI交互场景下的常见用户意图
## 新增字段
- `synthetic_prompts`:生成的合成提示词列表(通常包含3条)
- `original_prompt`:合成前的原始用户指令
- `selected_synthetic_prompt`:用于更新对话消息的选定合成提示词
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP-synthetic-prompts-qwen25vl7binstruct")
# 访问数据样本
sample = dataset['train'][0]
# 原始提示词
print(sample['original_prompt'])
# 所有生成的合成提示词
print(sample['synthetic_prompts'])
# 用于对话消息的选定合成提示词
print(sample['selected_synthetic_prompt'])
# 带有合成提示词的更新后对话消息
print(sample['messages'])
## 许可证
本数据集继承源数据集 mlfoundations-cua-dev/easyr1-10k-hard-qwen7b-easy-gta1-4MP 的许可证协议。
提供机构:
maas
创建时间:
2025-10-03
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集源自easyr1-10k-hard-qwen7b-easy-gta1-4MP,通过Qwen2.5-VL-7B-Instruct模型生成了合成提示,用于模拟用户任务。它包含10000个条目,新增了合成提示列表、原始提示和选定提示等字段,适用于数据加载和处理。
以上内容由遇见数据集搜集并总结生成



