Chengheng/gui-grounding-mixed-external
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Chengheng/gui-grounding-mixed-external
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
task_categories:
- visual-question-answering
- image-to-text
tags:
- gui
- grounding
- vlm
- screenspot
- agents
pretty_name: GUI Grounding Mixed External
size_categories:
- 100K<n<1M
language:
- en
---
# GUI Grounding Mixed External
Task-level metadata for ~1.3M GUI grounding tasks aggregated from several public source datasets, normalized to a single schema and used to fine-tune a Qwen3.5-2B vision-language model (with LoRA) for GUI element grounding.
**Images are NOT included.** Only task metadata (instructions, bounding boxes, image paths) is distributed here. Images must be reconstructed from the original source datasets.
## Contents
| File | Tasks | Source | Size |
|------|------:|--------|-----:|
| `tasks_combined_raw.json` | 324,465 | merged: showui + uground + salesforce + gui_360 | 240 MB |
| `tasks_combined_clean.json` | 760,793 | above, after `01_clean_dataset.py` (adds bbox + augment variants) | 488 MB |
| `tasks_showui.json` | 14,934 | showlab/ShowUI-desktop | 9 MB |
| `tasks_uground.json` | 200,000 | osunlp/UGround-V1-Data (subset) | 142 MB |
| `tasks_salesforce.json` | 70,531 | Salesforce/grounding_dataset | 43 MB |
| `tasks_gui_360.json` | 39,000 | vyokky/GUI-360 (grounding_resize split) | 38 MB |
| `tasks_aria_ui.json` | 905,087 | Aria-UI/Aria-UI_Data (desktop batch 2) | 568 MB |
## Task schema
Each task is a JSON object:
```json
{
"task_type": "grounding",
"image": "outputs/external_datasets/<ds>/images/<id>.png",
"func": "The text instruction describing what to click.",
"point": [cx, cy], // center of the click target
"bbox": [x1, y1, x2, y2], // full click-target bounding box
"elem_text": "",
"elem_tag": "",
"conversations": [
{"role": "user", "content": "<image>\n<instruction>"},
{"role": "assistant", "content": "(<cx>,<cy>)"}
],
"trajectory_id": "external_<ds>",
"action_type": "click",
"source": "external",
"source_bucket": "<showui|uground|salesforce|gui_360|aria_ui>"
}
```
- **Coordinates**: `point` and `bbox` are pre-normalized to `[0, 999]` (x, y integer grid) — will be remapped by downstream pipeline to `[0, 1000]` in (y, x) order for Gemma 4 / Qwen formats.
- **`image`** is a **relative path** pointing into a local directory tree of reconstructed images.
## Reconstructing images
Images live in the source HF repos under different layouts. Use our reconstruction script (or manual `hf_hub_download`) — one pattern per dataset:
| Dataset | HF repo | How to get images |
|---------|---------|-------------------|
| showui_desktop | `showlab/ShowUI-desktop` | `load_dataset("showlab/ShowUI-desktop", split="train")` → images are inline |
| uground | `osunlp/UGround-V1-Data` | `load_dataset("osunlp/UGround-V1-Data", split="train", streaming=True)` — images inline |
| salesforce | `Salesforce/grounding_dataset` | `load_dataset("Salesforce/grounding_dataset")` — images inline |
| gui_360 | `vyokky/GUI-360` | `hf_hub_download("vyokky/GUI-360", "processed_data/grounding_resize/images.tar.gz")` — tar archive |
| aria_ui | `Aria-UI/Aria-UI_Data` | `hf_hub_download("Aria-UI/Aria-UI_Data", "desktop/screenshots_fix_batch2.zip")` — zip archive |
Our reconstruction code (converter + image saver) is open source — see [github.com/ChenghengLi/WebOS](https://github.com/ChenghengLi/WebOS) (`scripts/00_download_external_datasets.py`). Clone that repo, run it with an HF token, and images land at the relative paths this metadata references.
## How this was built
1. **Per-dataset converter** (`scripts/00_download_external_datasets.py`) iterates each source dataset and writes `outputs/external_datasets/<ds>/tasks.json` plus `outputs/external_datasets/<ds>/images/`.
2. **Merge**: concatenate per-dataset tasks files (adds `source_bucket` field).
3. **Clean** (`scripts/01_clean_dataset.py`): de-duplicates, expands each `grounding` task into paired `grounding` + `grounding_bbox` variants, filters null images.
4. **Splits** (`scripts/03_create_splits.py`): train/val/test split, with explicit contamination check against ScreenSpot-v2 and ScreenSpot-Pro.
5. **Gemma 4 ready** (`scripts/04_convert_gemma4.py`): converts to chat-template JSONL with `<unused0-2>` ACTOR tokens.
## License
This repo's **task metadata** is released under Apache-2.0. **Source images** are licensed per their originating repos — read each source dataset's card before redistributing:
- ShowUI-desktop: Apache-2.0
- UGround-V1-Data: **CC BY-NC-SA 4.0** (non-commercial)
- Salesforce/grounding_dataset: Apache-2.0
- GUI-360: MIT
- Aria-UI_Data: Apache-2.0
**If your downstream use is commercial, exclude UGround** or replace with a commercially licensed alternative.
## Citation
If you use this aggregation, please cite the source datasets and this repo. See each source's HF page for canonical bibtex.
提供机构:
Chengheng



