tathadn/visiontriage-multimodal

Name: tathadn/visiontriage-multimodal
Creator: tathadn
Published: 2026-04-19 16:55:34
License: 暂无描述

Hugging Face2026-04-19 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/tathadn/visiontriage-multimodal

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - image-text-to-text - text-classification language: - en tags: - bug-triage - software-engineering - severity-classification - multimodal - ui-screenshots - synthetic size_categories: - 1K<n<10K pretty_name: VisionTriage Multimodal Bug Reports --- # VisionTriage — Multimodal Bug Report Dataset **5,551 synthetic (screenshot, bug report, severity) triples** for automated UI bug severity triage. Built on top of [Rico](http://www.interactionmining.org/rico.html) (72k Android UI screenshots): each base screenshot has a localized visual defect injected by one of 5 deterministic mutators, and an LLM-generated bug report paired to the mutated image. Each bug type maps to a single severity label, enabling a clean per-class ablation. - **Repository:** https://github.com/tathadn/visiontriage - **Paired model (best config):** https://huggingface.co/tathadn/visiontriage-config-c ## Splits | Split | N | Share | |-------|------|-------| | train | 4441 | 80.0% | | val | 555 | 10.0% | | test | 555 | 10.0% | ## Bug types → severity | Bug type | Mutator behavior | Severity label | |-------------------|---------------------------------------------------------|----------------| | `crash_dialog` | ANR / fatal crash dialog rendered over the UI | `blocker` | | `occlude_element` | Opaque rectangle obscures an interactive control | `critical` | | `overlap_siblings`| Two sibling views rendered at the same bounding box | `major` | | `wrong_color` | Button/background recolored to a contrasting value | `minor` | | `subtle_offset` | Element translated by a few pixels | `trivial` | Because each bug type is mapped to exactly one severity, per-bug-type accuracy equals per-severity recall — useful for a clean ablation across visual severities. ## Schema | Field | Type | Description | |-----------------------|--------------|--------------------------------------------------------------------| | `rico_id` | string | ID of the base Rico screenshot (join key to Rico source images) | | `bug_type` | string | One of the 5 mutators above | | `severity_true` | string | Ground-truth severity (derived from `bug_type` per the table above) | | `severity_pred` | string | Zero-shot Qwen2.5-VL-7B prediction (reference label, not target) | | `severity_raw` | string | Raw (pre-parse) zero-shot model output | | `parse_method` | string | How `severity_pred` was parsed from `severity_raw` | | `summary` | string | Synthetic bug report: one-line summary | | `steps_to_reproduce` | list[string] | Synthetic bug report: reproduction steps | | `actual_behavior` | string | Synthetic bug report: observed behavior | | `expected_behavior` | string | Synthetic bug report: expected behavior | | `package` | string | Android package name of the source Rico app | | `width`, `height` | int | Mutated screenshot dimensions (pixels) | ## Images are referenced, not stored This dataset **ships with text fields and `rico_id` only** — not the screenshots themselves. To materialize (image, text, severity) triples, you need to: 1. Download the Rico dataset (60 GB) from http://www.interactionmining.org/rico.html. 2. Run the bug-injection pipeline from the project repo (`src/data/inject_bugs.py`) with the same `rico_id` list to regenerate the mutated screenshots deterministically. See https://github.com/tathadn/visiontriage#reproduction for the full pipeline. ## How it was built 1. **Filter Rico** → 3k screenshots with usable view hierarchies (no empty hierarchies, mobile-portrait, minimum element count). 2. **Inject bugs** → for each screenshot, apply 1–3 of the 5 mutators; each mutation produces one sample. 3. **Generate reports** → Qwen2.5-VL-7B-Instruct (zero-shot) produces a summary / STR / actual / expected for each mutated screenshot. 4. **Label severities** → deterministic `bug_type → severity` map (no human labeling). Total: 5,551 samples that pass post-generation validation (report parses, severity extractable). ## Intended use - Training and evaluating multimodal severity-triage models (see Config B/C/D in the paired repo). - Ablating image vs. text contributions to bug classification — the fixed bug-type→severity map keeps labels clean. - NOT suitable for: real-world bug report classification without domain adaptation (reports are LLM-generated and stylized), or safety-critical deployment. ## Limitations - **Synthetic text** — bug reports are LLM-generated and may be stylistically uniform; real reports have noisier phrasing, missing fields, and irregular structure. - **Deterministic labels** — severity is derived from bug type, not from human annotation. Real triage involves subjective judgment. - **UI domain only** — Android UI screenshots; not representative of backend, API, or systems bugs. - **English only.** ## License CC-BY-4.0 for the synthetic reports and metadata. Rico screenshots are distributed by their original authors under the terms at http://www.interactionmining.org/rico.html — respect those terms for any downstream redistribution of derived images. ## Citation ```bibtex @misc{visiontriage2026, title = {VisionTriage: Multimodal Severity Prediction for UI Bug Reports}, author = {Debnath, Tathagata}, year = {2026}, url = {https://github.com/tathadn/visiontriage} } ```

提供机构：

tathadn

5,000+

优质数据集

54 个

任务类型

进入经典数据集