tathadn/visiontriage-multimodal
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/tathadn/visiontriage-multimodal
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-text-to-text
- text-classification
language:
- en
tags:
- bug-triage
- software-engineering
- severity-classification
- multimodal
- ui-screenshots
- synthetic
size_categories:
- 1K<n<10K
pretty_name: VisionTriage Multimodal Bug Reports
---
# VisionTriage — Multimodal Bug Report Dataset
**5,551 synthetic (screenshot, bug report, severity) triples** for automated UI bug severity triage.
Built on top of [Rico](http://www.interactionmining.org/rico.html) (72k Android UI screenshots): each base screenshot has a localized visual defect injected by one of 5 deterministic mutators, and an LLM-generated bug report paired to the mutated image. Each bug type maps to a single severity label, enabling a clean per-class ablation.
- **Repository:** https://github.com/tathadn/visiontriage
- **Paired model (best config):** https://huggingface.co/tathadn/visiontriage-config-c
## Splits
| Split | N | Share |
|-------|------|-------|
| train | 4441 | 80.0% |
| val | 555 | 10.0% |
| test | 555 | 10.0% |
## Bug types → severity
| Bug type | Mutator behavior | Severity label |
|-------------------|---------------------------------------------------------|----------------|
| `crash_dialog` | ANR / fatal crash dialog rendered over the UI | `blocker` |
| `occlude_element` | Opaque rectangle obscures an interactive control | `critical` |
| `overlap_siblings`| Two sibling views rendered at the same bounding box | `major` |
| `wrong_color` | Button/background recolored to a contrasting value | `minor` |
| `subtle_offset` | Element translated by a few pixels | `trivial` |
Because each bug type is mapped to exactly one severity, per-bug-type accuracy equals per-severity recall — useful for a clean ablation across visual severities.
## Schema
| Field | Type | Description |
|-----------------------|--------------|--------------------------------------------------------------------|
| `rico_id` | string | ID of the base Rico screenshot (join key to Rico source images) |
| `bug_type` | string | One of the 5 mutators above |
| `severity_true` | string | Ground-truth severity (derived from `bug_type` per the table above) |
| `severity_pred` | string | Zero-shot Qwen2.5-VL-7B prediction (reference label, not target) |
| `severity_raw` | string | Raw (pre-parse) zero-shot model output |
| `parse_method` | string | How `severity_pred` was parsed from `severity_raw` |
| `summary` | string | Synthetic bug report: one-line summary |
| `steps_to_reproduce` | list[string] | Synthetic bug report: reproduction steps |
| `actual_behavior` | string | Synthetic bug report: observed behavior |
| `expected_behavior` | string | Synthetic bug report: expected behavior |
| `package` | string | Android package name of the source Rico app |
| `width`, `height` | int | Mutated screenshot dimensions (pixels) |
## Images are referenced, not stored
This dataset **ships with text fields and `rico_id` only** — not the screenshots themselves. To materialize (image, text, severity) triples, you need to:
1. Download the Rico dataset (60 GB) from http://www.interactionmining.org/rico.html.
2. Run the bug-injection pipeline from the project repo (`src/data/inject_bugs.py`) with the same `rico_id` list to regenerate the mutated screenshots deterministically.
See https://github.com/tathadn/visiontriage#reproduction for the full pipeline.
## How it was built
1. **Filter Rico** → 3k screenshots with usable view hierarchies (no empty hierarchies, mobile-portrait, minimum element count).
2. **Inject bugs** → for each screenshot, apply 1–3 of the 5 mutators; each mutation produces one sample.
3. **Generate reports** → Qwen2.5-VL-7B-Instruct (zero-shot) produces a summary / STR / actual / expected for each mutated screenshot.
4. **Label severities** → deterministic `bug_type → severity` map (no human labeling).
Total: 5,551 samples that pass post-generation validation (report parses, severity extractable).
## Intended use
- Training and evaluating multimodal severity-triage models (see Config B/C/D in the paired repo).
- Ablating image vs. text contributions to bug classification — the fixed bug-type→severity map keeps labels clean.
- NOT suitable for: real-world bug report classification without domain adaptation (reports are LLM-generated and stylized), or safety-critical deployment.
## Limitations
- **Synthetic text** — bug reports are LLM-generated and may be stylistically uniform; real reports have noisier phrasing, missing fields, and irregular structure.
- **Deterministic labels** — severity is derived from bug type, not from human annotation. Real triage involves subjective judgment.
- **UI domain only** — Android UI screenshots; not representative of backend, API, or systems bugs.
- **English only.**
## License
CC-BY-4.0 for the synthetic reports and metadata. Rico screenshots are distributed by their original authors under the terms at http://www.interactionmining.org/rico.html — respect those terms for any downstream redistribution of derived images.
## Citation
```bibtex
@misc{visiontriage2026,
title = {VisionTriage: Multimodal Severity Prediction for UI Bug Reports},
author = {Debnath, Tathagata},
year = {2026},
url = {https://github.com/tathadn/visiontriage}
}
```
提供机构:
tathadn



