GUI-Drag-dataset
收藏魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/osunlp/GUI-Drag-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Our dataset is built upon the Uground, Jedi, and additional public paper-style academic document screenshots.
Project webpage: https://osu-nlp-group.github.io/GUI-Drag
**NOTE**: Before you use this dataset, make sure you understand the logic of absolute coordinates and [image processor](https://github.com/QwenLM/Qwen2.5-VL/blob/d2240f11656bfe404b9ba56db4e51cd09f522ff1/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L60) for [Qwen2.5-VL](https://arxiv.org/abs/2502.13923).
This dataset is set with the image processor max tokens to be 2700, a.k.a max_pixels=2700x14x14x2x2 , the coordinates were resized to be smaller and you have to resize the image as well within max_pixels=2700x14x14x2x2 via image processor to make them align.
Make sure you also follow it in your training procedure, otherwise the performance will not be as expected.
本数据集基于Uground、Jedi以及额外的公开论文格式学术文档截图构建而成。
项目主页:https://osu-nlp-group.github.io/GUI-Drag
**注意事项**:在使用本数据集前,请务必理解绝对坐标逻辑,以及适配[Qwen2.5-VL](https://arxiv.org/abs/2502.13923)的[图像处理器(image processor)](https://github.com/QwenLM/Qwen2.5-VL/blob/d2240f11656bfe404b9ba56db4e51cd09f522ff1/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L60)。
本数据集将图像处理器的最大Token数设置为2700,即最大像素数为2700×14×14×2×2。坐标已被缩小至适配该设置,因此你需通过图像处理器将图像也调整至该最大像素范围内,以确保坐标匹配。
请务必在训练流程中遵循该要求,否则模型性能将无法达到预期效果。
提供机构:
maas
创建时间:
2025-10-03



