AndesUI_Benchmark

Name: AndesUI_Benchmark
Creator: maas
Published: 2025-12-05 16:54:48
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/Oppo/AndesUI_Benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

# AndesUI_Benchmark The AndesUI dataset consists of two parts: a **training set (train)** and a **test set (test)**, where the apps in the test set do not appear in the training set. The dataset covers three main task types: 1.Grounding Task: Predict the bounding box (bbox) coordinates given a widget description. 2.Referring Task: Predict the corresponding widget description given a bounding box (bbox). 3.QA Task: Answer the natural language question by predicting the bbox coordinates that need to be clicked. According to the technical report, the original test set contains: - **8,642 Referring samples** - **7,194 Grounding samples** - **1,181 QA samples** To simplify testing, we randomly selected the following subsets: - **Referring:** 1,500 samples - **Grounding:** 1,500 samples - **QA:** 748 samples These subsets have been open-sourced. Each subset is provided in both **JSON** and **TSV** formats, with image paths stored as relative paths. ### Task Details #### **Referring Task** The JSON file includes three fields: - **description** (the expected output) - **imgpath** (image path) - **bbox** (bounding box) This task requires the model to predict the **description** of a widget based on its given **bbox**. Different models support different bbox formats: - **Qwen model:** Prefers `[xmin, ymin, xmax, ymax]` (raw pixel coordinates) - **Intern model:** Prefers `[xmin, ymin, xmax, ymax]` (normalized coordinates) **Accuracy metric:** A prediction (`pred`) is considered correct if its longest common substring with the ground-truth `description` is non-empty. #### **Grounding Task** The JSON file includes three fields: - **question** (the widget description) - **imgpath** (image path) - **bbox** (ground-truth bounding box) This task is the inverse of Referring: the model must predict the **bbox** of a widget given its **description**. Different models output different formats: - Some models directly predict `[x_center, y_center]` (center coordinates) - Others predict `[xmin, ymin, xmax, ymax]` (full bbox) If the model outputs a full bbox, we compute its geometric center as `[x_center, y_center]`. **Accuracy metric:** The predicted center must lie inside the ground-truth bbox. #### **QA Task** The evaluation logic for the **QA task** is the same as for the **Grounding task**. The evaluation functionality for the AndesUI Test dataset has been integrated into the VLMEvalKit toolkit. For the three task types (Grounding, Referring, and QA), we have developed evaluation scripts respectively. These scripts should be placed in the vlmeval/dataset/GUI directory, where users can directly invoke them for automated assessment.

# AndesUI基准数据集（AndesUI_Benchmark） AndesUI基准数据集包含两部分：**训练集（train）**与**测试集（test）**，且测试集所涵盖的应用程序不会出现在训练集中。该数据集涵盖三类核心任务类型： 1. 定位任务（Grounding Task）：根据组件描述预测边界框（bounding box，简称bbox）坐标。 2. 指代任务（Referring Task）：根据给定的边界框（bbox）预测对应的组件描述。 3. 问答任务（QA Task）：通过预测需要点击的边界框（bbox）坐标来回答自然语言问题。根据技术报告，原始测试集包含： - **8642个指代任务样本** - **7194个定位任务样本** - **1181个问答任务样本** 为简化测试流程，我们随机选取了如下子集并已开源： - **指代任务：1500个样本** - **定位任务：1500个样本** - **问答任务：748个样本** 每个子集均提供**JSON**与**TSV**两种格式，图像路径以相对路径形式存储。 ### 任务细节 #### 指代任务（Referring Task） JSON文件包含三个字段： - **description**（预期输出结果） - **imgpath**（图像路径） - **bbox**（边界框）该任务要求模型根据给定的**边界框（bbox）**预测对应组件的**描述文本**。不同模型支持的边界框格式存在差异： - **Qwen模型**：优先采用`[xmin, ymin, xmax, ymax]`格式（原始像素坐标） - **Intern模型**：优先采用`[xmin, ymin, xmax, ymax]`格式（归一化坐标） **准确率评估指标**：若预测结果（pred）与真实标注描述（ground-truth description）的最长公共子串非空，则判定该预测正确。 #### 定位任务（Grounding Task） JSON文件包含三个字段： - **question**（组件描述文本） - **imgpath**（图像路径） - **bbox**（真实标注边界框）该任务与指代任务互为逆过程：模型需根据给定的**组件描述**预测对应组件的**边界框（bbox）**。不同模型的输出格式有所区别： - 部分模型直接输出`[x_center, y_center]`（中心点坐标） - 其余模型则输出`[xmin, ymin, xmax, ymax]`（完整边界框）若模型输出完整边界框，我们会将其几何中心点计算为`[x_center, y_center]`。 **准确率评估指标**：预测得到的中心点需落在真实标注的边界框内。 #### 问答任务（QA Task）问答任务的评估逻辑与定位任务完全一致。 AndesUI测试集的评估功能已集成至VLMEvalKit工具包中。针对三类任务（定位、指代、问答），我们分别开发了对应的评估脚本，这些脚本需放置于`vlmeval/dataset/GUI`目录下，用户可直接调用以实现自动化评估。

提供机构：

maas

创建时间：

2025-10-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集