Chengheng/gui-grounding-mixed-external

Name: Chengheng/gui-grounding-mixed-external
Creator: Chengheng
Published: 2026-04-19 15:02:24
License: 暂无描述

Hugging Face2026-04-19 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/Chengheng/gui-grounding-mixed-external

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: other task_categories: - visual-question-answering - image-to-text tags: - gui - grounding - vlm - screenspot - agents pretty_name: GUI Grounding Mixed External size_categories: - 100K<n<1M language: - en --- # GUI Grounding Mixed External Task-level metadata for ~1.3M GUI grounding tasks aggregated from several public source datasets, normalized to a single schema and used to fine-tune a Qwen3.5-2B vision-language model (with LoRA) for GUI element grounding. **Images are NOT included.** Only task metadata (instructions, bounding boxes, image paths) is distributed here. Images must be reconstructed from the original source datasets. ## Contents | File | Tasks | Source | Size | |------|------:|--------|-----:| | `tasks_combined_raw.json` | 324,465 | merged: showui + uground + salesforce + gui_360 | 240 MB | | `tasks_combined_clean.json` | 760,793 | above, after `01_clean_dataset.py` (adds bbox + augment variants) | 488 MB | | `tasks_showui.json` | 14,934 | showlab/ShowUI-desktop | 9 MB | | `tasks_uground.json` | 200,000 | osunlp/UGround-V1-Data (subset) | 142 MB | | `tasks_salesforce.json` | 70,531 | Salesforce/grounding_dataset | 43 MB | | `tasks_gui_360.json` | 39,000 | vyokky/GUI-360 (grounding_resize split) | 38 MB | | `tasks_aria_ui.json` | 905,087 | Aria-UI/Aria-UI_Data (desktop batch 2) | 568 MB | ## Task schema Each task is a JSON object: ```json { "task_type": "grounding", "image": "outputs/external_datasets/<ds>/images/<id>.png", "func": "The text instruction describing what to click.", "point": [cx, cy], // center of the click target "bbox": [x1, y1, x2, y2], // full click-target bounding box "elem_text": "", "elem_tag": "", "conversations": [ {"role": "user", "content": "<image>\n<instruction>"}, {"role": "assistant", "content": "(<cx>,<cy>)"} ], "trajectory_id": "external_<ds>", "action_type": "click", "source": "external", "source_bucket": "<showui|uground|salesforce|gui_360|aria_ui>" } ``` - **Coordinates**: `point` and `bbox` are pre-normalized to `[0, 999]` (x, y integer grid) — will be remapped by downstream pipeline to `[0, 1000]` in (y, x) order for Gemma 4 / Qwen formats. - **`image`** is a **relative path** pointing into a local directory tree of reconstructed images. ## Reconstructing images Images live in the source HF repos under different layouts. Use our reconstruction script (or manual `hf_hub_download`) — one pattern per dataset: | Dataset | HF repo | How to get images | |---------|---------|-------------------| | showui_desktop | `showlab/ShowUI-desktop` | `load_dataset("showlab/ShowUI-desktop", split="train")` → images are inline | | uground | `osunlp/UGround-V1-Data` | `load_dataset("osunlp/UGround-V1-Data", split="train", streaming=True)` — images inline | | salesforce | `Salesforce/grounding_dataset` | `load_dataset("Salesforce/grounding_dataset")` — images inline | | gui_360 | `vyokky/GUI-360` | `hf_hub_download("vyokky/GUI-360", "processed_data/grounding_resize/images.tar.gz")` — tar archive | | aria_ui | `Aria-UI/Aria-UI_Data` | `hf_hub_download("Aria-UI/Aria-UI_Data", "desktop/screenshots_fix_batch2.zip")` — zip archive | Our reconstruction code (converter + image saver) is open source — see [github.com/ChenghengLi/WebOS](https://github.com/ChenghengLi/WebOS) (`scripts/00_download_external_datasets.py`). Clone that repo, run it with an HF token, and images land at the relative paths this metadata references. ## How this was built 1. **Per-dataset converter** (`scripts/00_download_external_datasets.py`) iterates each source dataset and writes `outputs/external_datasets/<ds>/tasks.json` plus `outputs/external_datasets/<ds>/images/`. 2. **Merge**: concatenate per-dataset tasks files (adds `source_bucket` field). 3. **Clean** (`scripts/01_clean_dataset.py`): de-duplicates, expands each `grounding` task into paired `grounding` + `grounding_bbox` variants, filters null images. 4. **Splits** (`scripts/03_create_splits.py`): train/val/test split, with explicit contamination check against ScreenSpot-v2 and ScreenSpot-Pro. 5. **Gemma 4 ready** (`scripts/04_convert_gemma4.py`): converts to chat-template JSONL with `<unused0-2>` ACTOR tokens. ## License This repo's **task metadata** is released under Apache-2.0. **Source images** are licensed per their originating repos — read each source dataset's card before redistributing: - ShowUI-desktop: Apache-2.0 - UGround-V1-Data: **CC BY-NC-SA 4.0** (non-commercial) - Salesforce/grounding_dataset: Apache-2.0 - GUI-360: MIT - Aria-UI_Data: Apache-2.0 **If your downstream use is commercial, exclude UGround** or replace with a commercially licensed alternative. ## Citation If you use this aggregation, please cite the source datasets and this repo. See each source's HF page for canonical bibtex.

提供机构：

Chengheng

5,000+

优质数据集

54 个

任务类型

进入经典数据集