five

yayuanli/MATT-Bench

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/yayuanli/MATT-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - video-classification - video-text-to-text - object-detection tags: - egocentric-video - mistake-detection - temporal-localization - video-language-grounding - hand-object-interaction - action-recognition - procedural-activities - semantic-role-labeling - ego4d - epic-kitchens - point-of-no-return - cvpr2026 pretty_name: MATT-Bench size_categories: - 100K<n<1M --- # Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos **CVPR 2026** [Yayuan Li](https://www.linkedin.com/in/yayuan-li-148659272/)<sup>1</sup>, [Aadit Jain](https://www.linkedin.com/in/jain-aadit/)<sup>1</sup>, [Filippos Bellos](https://www.linkedin.com/in/filippos-bellos-168595156/)<sup>1</sup>, [Jason J. Corso](https://www.linkedin.com/in/jason-corso/)<sup>1,2</sup> <sup>1</sup>University of Michigan, <sup>2</sup>Voxel51 [[Paper](https://arxiv.org/abs/2511.20525)] [[Code](https://github.com/yayuanli/MATT)] [[Project Page](https://yayuanli.github.io/MATT/)] --- > **Dataset coming soon.** We are preparing the data for public release. Stay tuned! ## MATT-Bench Overview MATT-Bench provides two large-scale benchmarks for **Mistake Attribution (MATT)** — a task that goes beyond binary mistake detection to attribute *what* semantic role was violated, *when* the mistake became irreversible (Point-of-No-Return), and *where* the mistake occurred in the frame. The benchmarks are constructed by **MisEngine**, a data engine that automatically creates mistake samples with attribution-rich annotations from existing egocentric action datasets: | Dataset | Samples | Instruction Texts | Semantic | Temporal | Spatial | |---|---|---|---|---|---| | **Ego4D-M** | 257,584 | 16,099 | ✓ | ✓ | ✓ | | **EPIC-KITCHENS-M** | 221,094 | 12,283 | ✓ | — | — | These are at least **two orders of magnitude larger** than any existing mistake dataset. ## Annotations Each sample consists of an instruction text and an attempt video, annotated with: - **Semantic Attribution**: Which semantic role (predicate, object) in the instruction is violated in the attempt video - **Temporal Attribution**: The Point-of-No-Return (PNR) frame where the mistake becomes irreversible (Ego4D-M) - **Spatial Attribution**: Bounding box localizing the mistake region in the PNR frame (Ego4D-M) ## Citation ```bibtex @inproceedings{li2026mistakeattribution, title = {Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos}, author = {Li, Yayuan and Jain, Aadit and Bellos, Filippos and Corso, Jason J.}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2026}, } ```
提供机构:
yayuanli
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作