Name: mistakeattribution/MATT-Bench
Creator: mistakeattribution
Published: 2026-04-17 07:57:35
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/mistakeattribution/MATT-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - video-classification - video-text-to-text - object-detection tags: - egocentric-video - mistake-detection - temporal-localization - video-language-grounding - hand-object-interaction - action-recognition - procedural-activities - semantic-role-labeling - ego4d - epic-kitchens - holoassist - point-of-no-return - cvpr2026 pretty_name: MATT-Bench size_categories: - 100K<n<1M configs: - config_name: ego4d data_files: - split: train path: ego4d/parquet/train.parquet - split: valid path: ego4d/parquet/valid.parquet - split: test path: ego4d/parquet/test.parquet - config_name: epickitchens data_files: - split: train path: epickitchens/parquet/train.parquet - split: validation path: epickitchens/parquet/validation.parquet - config_name: holoassist data_files: - split: train path: holoassist/parquet/train.parquet - split: validation path: holoassist/parquet/validation.parquet --- # Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos **CVPR 2026** [Yayuan Li](https://www.linkedin.com/in/yayuan-li-148659272/)1, [Aadit Jain](https://www.linkedin.com/in/jain-aadit/)1, [Filippos Bellos](https://www.linkedin.com/in/filippos-bellos-168595156/)1, [Jason J. Corso](https://www.linkedin.com/in/jason-corso/)1,2 1University of Michigan, 2Voxel51 [[Paper](https://arxiv.org/abs/2511.20525)] [[Code](https://github.com/yayuanli/MATT)] [[Project Page](https://yayuanli.github.io/MATT/)] --- ## MATT-Bench Overview MATT-Bench provides large-scale benchmarks for **Mistake Attribution (MATT)** — a task that goes beyond binary mistake detection to attribute *what* semantic role was violated, *when* the mistake became irreversible (Point-of-No-Return), and *where* the mistake occurred in the frame. The benchmarks are constructed by **MisEngine**, a data engine that automatically creates mistake samples with attribution-rich annotations from existing egocentric action datasets: | Dataset | Samples | Instruction Texts | Semantic | Temporal | Spatial | |---------------------|---------|-------------------|----------|----------|---------| | **Ego4D-M** | 220,800 | 19,467 | ✓ | ✓ | ✓ | | **EPIC-KITCHENS-M** | 299,715 | 12,283 | ✓ | — | — | These are at least **two orders of magnitude larger** than any existing mistake dataset. Instruction-text counts = unique (predicate `V`, argument `ARG1`) pairs. A third source, **HoloAssist-M**, is released alongside as an additional benchmark — see [Extended: HoloAssist-M](#extended-holoassist-m) below. **Repository Layout** ``` MATT-Bench/ ├── ego4d/ │ ├── train.xlsx, valid.xlsx, test.xlsx ← primary annotation files (consumed by the MATT codebase) │ ├── parquet.xlsx ← MisEngine reproduction data (Ego4D narrations with SRL) │ └── parquet/ ← Parquet mirror for the HF dataset viewer ├── epickitchens/ │ ├── train.xlsx, validation.xlsx │ └── parquet/ └── holoassist/ ├── train.xlsx, validation.xlsx └── parquet/ ``` `.xlsx` is the canonical download format (the MATT codebase reads Excel directly). The `parquet/` mirror powers the HF dataset viewer and `datasets.load_dataset(...)` loaders — both views contain the same rows. ## Downloading MATT-Bench MATT-Bench has two parts that you obtain separately: 1. **Annotations** — semantic attribution annotations are hosted here, download via `hf` or `git clone`. Temporal and spatial attribution annotations are inherited from the original dataset. 2. **Video media** — **not** hosted here. Download from each source dataset using the instructions below. Original videos retain their upstream licenses. ### Annotations (this repo) ```bash # Everything hf download mistakeattribution/MATT-Bench --repo-type dataset --local-dir MATT-Bench # Just one source dataset's xlsx files hf download mistakeattribution/MATT-Bench --repo-type dataset \ --include "ego4d/*.xlsx" --local-dir MATT-Bench ``` Or via the `datasets` library (reads the parquet mirror): ```python from datasets import load_dataset ego4d_m = load_dataset("mistakeattribution/MATT-Bench", "ego4d") epic_m = load_dataset("mistakeattribution/MATT-Bench", "epickitchens") holo_m = load_dataset("mistakeattribution/MATT-Bench", "holoassist") ``` ### Video media #### Ego4D Follow <https://ego4d-data.org/docs/CLI/> to download. The `video_uid` and `clip1_uid` fields in our annotations correspond to Ego4D's native video and clip UIDs. MATT-Bench uses the FHO (Forecasting Hands and Objects) benchmark clips from Ego4D. Example downloading script: ```bash ego4d --output_directory="~/ego4d_data" --datasets clips --benchmarks FHO ``` #### EPIC-KITCHENS-100 Follow <https://epic-kitchens.github.io/> to download. MATT-Bench's `video_id` matches EPIC's participant-video identifier (e.g. `P22_16`); `start_frame` / `end_frame` index the RGB frame sequence. Example download script: ```bash git clone https://github.com/epic-kitchens/epic-kitchens-download-scripts cd epic-kitchens-download-scripts python epic_downloader.py --rgb-frames # or --videos ``` #### HoloAssist Although not reported in the paper, we also support the HoloAssist dataset. Download the following from the [HoloAssist project page](https://holoassist.github.io/): | Resource | Link | Size | |------------------------|------------------------------------------------------------------------------------------------------------------------------------------|-----------| | Videos (pitch-shifted) | [video_pitch_shifted.tar](https://hl2data.z5.web.core.windows.net/holoassist-data-release/video_pitch_shifted.tar) | 184.20 GB | | Labels | [data-annotation-trainval-v1_1.json](https://hl2data.z5.web.core.windows.net/holoassist-data-release/data-annotation-trainval-v1_1.json) | 111 MB | | Dataset splits | [data-splits-v1_2.zip](https://holoassist.github.io/label_files/data-splits-v1_2.zip) | — | MATT-Bench's `video_id` matches HoloAssist's video identifier (e.g. `R076-21July-DSLR`). ## Data Schema ### `ego4d/{train,valid,test}.xlsx` — 13 columns | Column | Description | |-----------------------------------------------------|----------------------------------------------------------------------------------------| | `video_uid` | Ego4D video UID (full video) | | `start_frame`, `end_frame` | Frame bounds of the attempt clip | | `clip1_uid`, `clip1_start_frame`, `clip1_end_frame` | Primary Ego4D clip | | `clip2_uid`, `clip2_start_frame`, `clip2_end_frame` | Some actions are distributed across two clips (`Not required` / `-1` when absent) | | `V`, `ARG1` | Predicate and argument from the instruction (e.g. `pick up`, `apple`) | | `label` | Mistake label. 0: Correct; 1: Mistaken Predicate; 2: Mistaken Object; 3: Mistaken Both | | `split` | dataset split identifier | ### `ego4d/parquet.xlsx` — 29 columns (MisEngine reproduction data) Ego4D narration-level records with semantic-role labels (`ARG0`, `V`, `ARG1`), frame/time bounds (`start_frame`/`end_frame`/`start_sec`/`end_sec`), clip-relative bounds, and noun/verb embedding vectors. Used to reproduce the MisEngine step that produces the split files above. ### `epickitchens/{train,validation}.xlsx` and `holoassist/{train,validation}.xlsx` — 8 columns | Column | Description | |----------------------------|---------------------------------------------------------| | `video_id` | Source-dataset video identifier | | `start_frame`, `end_frame` | Frame bounds of the attempt clip | | `V`, `ARG1` | Predicate and argument of the instruction text | | `label` | Mistake label | | `actual_V`, `actual_ARG1` | Predicate/argument of the action performed in the video | ### Extended: HoloAssist-M **HoloAssist-M** is an additional MATT benchmark released alongside MATT-Bench. It is **not** part of the main two-dataset evaluation reported in the CVPR 2026 paper; it uses the same MisEngine pipeline applied to the HoloAssist dataset. | Dataset | Samples | Instruction Texts | Semantic | Temporal | Spatial | |------------------|---------|-------------------|----------|----------|---------| | **HoloAssist-M** | 562,209 | 1,786 | ✓ | — | — | Schema matches EPIC-KITCHENS-M (semantic attribution only — HoloAssist does not provide native PNR frame number andb bbox annotations). ## Citation ```bibtex @inproceedings{li2026mistakeattribution, title = {Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos}, author = {Li, Yayuan and Jain, Aadit and Bellos, Filippos and Corso, Jason J.}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2026}, } ``` Please also cite the source datasets: ```bibtex @inproceedings{grauman2022ego4d, title = {Ego4D: Around the World in 3,000 Hours of Egocentric Video}, author = {Grauman, Kristen and others}, booktitle = {CVPR}, year = {2022} } @article{Damen2022RESCALING, title = {Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100}, author = {Damen, Dima and others}, journal = {IJCV}, year = {2022} } @inproceedings{wang2023holoassist, title = {HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World}, author = {Wang, Xin and others}, booktitle = {ICCV}, year = {2023} } ```

应用场景：