mistakeattribution/MATT-Bench
收藏Hugging Face2026-04-17 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/mistakeattribution/MATT-Bench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- video-classification
- video-text-to-text
- object-detection
tags:
- egocentric-video
- mistake-detection
- temporal-localization
- video-language-grounding
- hand-object-interaction
- action-recognition
- procedural-activities
- semantic-role-labeling
- ego4d
- epic-kitchens
- holoassist
- point-of-no-return
- cvpr2026
pretty_name: MATT-Bench
size_categories:
- 100K<n<1M
configs:
- config_name: ego4d
data_files:
- split: train
path: ego4d/parquet/train.parquet
- split: valid
path: ego4d/parquet/valid.parquet
- split: test
path: ego4d/parquet/test.parquet
- config_name: epickitchens
data_files:
- split: train
path: epickitchens/parquet/train.parquet
- split: validation
path: epickitchens/parquet/validation.parquet
- config_name: holoassist
data_files:
- split: train
path: holoassist/parquet/train.parquet
- split: validation
path: holoassist/parquet/validation.parquet
---
# Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
**CVPR 2026**
[Yayuan Li](https://www.linkedin.com/in/yayuan-li-148659272/)<sup>1</sup>, [Aadit Jain](https://www.linkedin.com/in/jain-aadit/)<sup>1</sup>, [Filippos Bellos](https://www.linkedin.com/in/filippos-bellos-168595156/)<sup>1</sup>, [Jason J. Corso](https://www.linkedin.com/in/jason-corso/)<sup>1,2</sup>
<sup>1</sup>University of Michigan, <sup>2</sup>Voxel51
[[Paper](https://arxiv.org/abs/2511.20525)] [[Code](https://github.com/yayuanli/MATT)] [[Project Page](https://yayuanli.github.io/MATT/)]
---
## MATT-Bench Overview
MATT-Bench provides large-scale benchmarks for **Mistake Attribution (MATT)** — a task that goes beyond binary mistake detection to attribute *what* semantic role was violated, *when* the mistake became irreversible (Point-of-No-Return), and *where* the mistake occurred in the frame.
The benchmarks are constructed by **MisEngine**, a data engine that automatically creates mistake samples with attribution-rich annotations from existing egocentric action datasets:
| Dataset | Samples | Instruction Texts | Semantic | Temporal | Spatial |
|---------------------|---------|-------------------|----------|----------|---------|
| **Ego4D-M** | 220,800 | 19,467 | ✓ | ✓ | ✓ |
| **EPIC-KITCHENS-M** | 299,715 | 12,283 | ✓ | — | — |
These are at least **two orders of magnitude larger** than any existing mistake dataset. Instruction-text counts = unique (predicate `V`, argument `ARG1`) pairs.
A third source, **HoloAssist-M**, is released alongside as an additional benchmark — see [Extended: HoloAssist-M](#extended-holoassist-m) below.
**Repository Layout**
```
MATT-Bench/
├── ego4d/
│ ├── train.xlsx, valid.xlsx, test.xlsx ← primary annotation files (consumed by the MATT codebase)
│ ├── parquet.xlsx ← MisEngine reproduction data (Ego4D narrations with SRL)
│ └── parquet/ ← Parquet mirror for the HF dataset viewer
├── epickitchens/
│ ├── train.xlsx, validation.xlsx
│ └── parquet/
└── holoassist/
├── train.xlsx, validation.xlsx
└── parquet/
```
`.xlsx` is the canonical download format (the MATT codebase reads Excel directly). The `parquet/` mirror powers the HF dataset viewer and `datasets.load_dataset(...)` loaders — both views contain the same rows.
## Downloading MATT-Bench
MATT-Bench has two parts that you obtain separately:
1. **Annotations** — semantic attribution annotations are hosted here, download via `hf` or `git clone`. Temporal and spatial attribution annotations are inherited from the original dataset.
2. **Video media** — **not** hosted here. Download from each source dataset using the instructions below. Original videos retain their upstream licenses.
### Annotations (this repo)
```bash
# Everything
hf download mistakeattribution/MATT-Bench --repo-type dataset --local-dir MATT-Bench
# Just one source dataset's xlsx files
hf download mistakeattribution/MATT-Bench --repo-type dataset \
--include "ego4d/*.xlsx" --local-dir MATT-Bench
```
Or via the `datasets` library (reads the parquet mirror):
```python
from datasets import load_dataset
ego4d_m = load_dataset("mistakeattribution/MATT-Bench", "ego4d")
epic_m = load_dataset("mistakeattribution/MATT-Bench", "epickitchens")
holo_m = load_dataset("mistakeattribution/MATT-Bench", "holoassist")
```
### Video media
#### Ego4D
Follow <https://ego4d-data.org/docs/CLI/> to download. The `video_uid` and `clip1_uid` fields in our annotations correspond to Ego4D's native video and clip UIDs.
MATT-Bench uses the FHO (Forecasting Hands and Objects) benchmark clips from Ego4D. Example downloading script:
```bash
ego4d --output_directory="~/ego4d_data" --datasets clips --benchmarks FHO
```
#### EPIC-KITCHENS-100
Follow <https://epic-kitchens.github.io/> to download. MATT-Bench's `video_id` matches EPIC's participant-video identifier (e.g. `P22_16`); `start_frame` / `end_frame` index the RGB frame sequence.
Example download script:
```bash
git clone https://github.com/epic-kitchens/epic-kitchens-download-scripts
cd epic-kitchens-download-scripts
python epic_downloader.py --rgb-frames # or --videos
```
#### HoloAssist
Although not reported in the paper, we also support the HoloAssist dataset.
Download the following from the [HoloAssist project page](https://holoassist.github.io/):
| Resource | Link | Size |
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| Videos (pitch-shifted) | [video_pitch_shifted.tar](https://hl2data.z5.web.core.windows.net/holoassist-data-release/video_pitch_shifted.tar) | 184.20 GB |
| Labels | [data-annotation-trainval-v1_1.json](https://hl2data.z5.web.core.windows.net/holoassist-data-release/data-annotation-trainval-v1_1.json) | 111 MB |
| Dataset splits | [data-splits-v1_2.zip](https://holoassist.github.io/label_files/data-splits-v1_2.zip) | — |
MATT-Bench's `video_id` matches HoloAssist's video identifier (e.g. `R076-21July-DSLR`).
## Data Schema
### `ego4d/{train,valid,test}.xlsx` — 13 columns
| Column | Description |
|-----------------------------------------------------|----------------------------------------------------------------------------------------|
| `video_uid` | Ego4D video UID (full video) |
| `start_frame`, `end_frame` | Frame bounds of the attempt clip |
| `clip1_uid`, `clip1_start_frame`, `clip1_end_frame` | Primary Ego4D clip |
| `clip2_uid`, `clip2_start_frame`, `clip2_end_frame` | Some actions are distributed across two clips (`Not required` / `-1` when absent) |
| `V`, `ARG1` | Predicate and argument from the instruction (e.g. `pick up`, `apple`) |
| `label` | Mistake label. 0: Correct; 1: Mistaken Predicate; 2: Mistaken Object; 3: Mistaken Both |
| `split` | dataset split identifier |
### `ego4d/parquet.xlsx` — 29 columns (MisEngine reproduction data)
Ego4D narration-level records with semantic-role labels (`ARG0`, `V`, `ARG1`), frame/time bounds (`start_frame`/`end_frame`/`start_sec`/`end_sec`), clip-relative bounds, and noun/verb embedding vectors. Used to reproduce the MisEngine step that produces the split files above.
### `epickitchens/{train,validation}.xlsx` and `holoassist/{train,validation}.xlsx` — 8 columns
| Column | Description |
|----------------------------|---------------------------------------------------------|
| `video_id` | Source-dataset video identifier |
| `start_frame`, `end_frame` | Frame bounds of the attempt clip |
| `V`, `ARG1` | Predicate and argument of the instruction text |
| `label` | Mistake label |
| `actual_V`, `actual_ARG1` | Predicate/argument of the action performed in the video |
### Extended: HoloAssist-M
**HoloAssist-M** is an additional MATT benchmark released alongside MATT-Bench. It is **not** part of the main two-dataset evaluation reported in the CVPR 2026 paper; it uses the same MisEngine pipeline applied to the HoloAssist dataset.
| Dataset | Samples | Instruction Texts | Semantic | Temporal | Spatial |
|------------------|---------|-------------------|----------|----------|---------|
| **HoloAssist-M** | 562,209 | 1,786 | ✓ | — | — |
Schema matches EPIC-KITCHENS-M (semantic attribution only — HoloAssist does not provide native PNR frame number andb bbox annotations).
## Citation
```bibtex
@inproceedings{li2026mistakeattribution,
title = {Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos},
author = {Li, Yayuan and Jain, Aadit and Bellos, Filippos and Corso, Jason J.},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026},
}
```
Please also cite the source datasets:
```bibtex
@inproceedings{grauman2022ego4d,
title = {Ego4D: Around the World in 3,000 Hours of Egocentric Video},
author = {Grauman, Kristen and others},
booktitle = {CVPR},
year = {2022}
}
@article{Damen2022RESCALING,
title = {Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100},
author = {Damen, Dima and others},
journal = {IJCV},
year = {2022}
}
@inproceedings{wang2023holoassist,
title = {HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World},
author = {Wang, Xin and others},
booktitle = {ICCV},
year = {2023}
}
```
提供机构:
mistakeattribution



