yayuanli/MATT-Bench

Name: yayuanli/MATT-Bench
Creator: yayuanli
Published: 2026-03-25 08:41:33
License: 暂无描述

Hugging Face2026-03-25 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/yayuanli/MATT-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - video-classification - video-text-to-text - object-detection tags: - egocentric-video - mistake-detection - temporal-localization - video-language-grounding - hand-object-interaction - action-recognition - procedural-activities - semantic-role-labeling - ego4d - epic-kitchens - point-of-no-return - cvpr2026 pretty_name: MATT-Bench size_categories: - 100K<n<1M --- # Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos **CVPR 2026** [Yayuan Li](https://www.linkedin.com/in/yayuan-li-148659272/)1, [Aadit Jain](https://www.linkedin.com/in/jain-aadit/)1, [Filippos Bellos](https://www.linkedin.com/in/filippos-bellos-168595156/)1, [Jason J. Corso](https://www.linkedin.com/in/jason-corso/)1,2 1University of Michigan, 2Voxel51 [[Paper](https://arxiv.org/abs/2511.20525)] [[Code](https://github.com/yayuanli/MATT)] [[Project Page](https://yayuanli.github.io/MATT/)] --- > **Dataset coming soon.** We are preparing the data for public release. Stay tuned! ## MATT-Bench Overview MATT-Bench provides two large-scale benchmarks for **Mistake Attribution (MATT)** — a task that goes beyond binary mistake detection to attribute *what* semantic role was violated, *when* the mistake became irreversible (Point-of-No-Return), and *where* the mistake occurred in the frame. The benchmarks are constructed by **MisEngine**, a data engine that automatically creates mistake samples with attribution-rich annotations from existing egocentric action datasets: | Dataset | Samples | Instruction Texts | Semantic | Temporal | Spatial | |---|---|---|---|---|---| | **Ego4D-M** | 257,584 | 16,099 | ✓ | ✓ | ✓ | | **EPIC-KITCHENS-M** | 221,094 | 12,283 | ✓ | — | — | These are at least **two orders of magnitude larger** than any existing mistake dataset. ## Annotations Each sample consists of an instruction text and an attempt video, annotated with: - **Semantic Attribution**: Which semantic role (predicate, object) in the instruction is violated in the attempt video - **Temporal Attribution**: The Point-of-No-Return (PNR) frame where the mistake becomes irreversible (Ego4D-M) - **Spatial Attribution**: Bounding box localizing the mistake region in the PNR frame (Ego4D-M) ## Citation ```bibtex @inproceedings{li2026mistakeattribution, title = {Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos}, author = {Li, Yayuan and Jain, Aadit and Bellos, Filippos and Corso, Jason J.}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2026}, } ```

提供机构：

yayuanli

5,000+

优质数据集

54 个

任务类型

进入经典数据集