ReXTime/ReXTime
收藏Hugging Face2024-07-16 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/ReXTime/ReXTime
下载链接
链接失效反馈官方服务:
资源简介:
ReXTime是一个旨在测试AI模型在视频事件中进行时间推理能力的基准数据集,特别关注跨视频片段理解因果关系的能力。数据集包含921个验证样本和2,143个测试样本,每个样本都经过人工审核以确保准确性和相关性。此外,数据集还包括9,695个机器生成的训练样本,这些样本通过自动化管道生成,减少了人工标注的需求。评估结果显示,尽管前沿的大型语言模型优于学术模型,但它们仍落后于人类表现,准确率差距为14.3%。
ReXTime is a benchmark designed to rigorously test AI models ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on “reasoning across time”, e.g. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across video segments, poses significant challenges to even the frontier multimodal large language models. To facilitate this evaluation, we develop an automated pipeline for generating temporal reasoning question-answer pairs, significantly reducing the need for labor-intensive manual annotations. Our benchmark includes 921 carefully vetted validation samples and 2,143 test samples, each manually curated for accuracy and relevance. Evaluation results show that while frontier large language models outperform academic models, they still lag behind human performance by a significant 14.3% accuracy gap. Additionally, our pipeline creates a training dataset of 9,695 machine generated samples without manual effort, which empirical studies suggest can enhance the across-time reasoning via fine-tuning.
提供机构:
ReXTime
原始信息汇总
ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
数据集概述
- 名称: ReXTime
- 任务类别:
- 视觉问答
- 多选题
- 语言: 英语
- 标签: croissant
- 别名: pretty_name: ReXTime
数据集详情
- 设计目的: 测试AI模型在视频事件中的时间推理能力,特别是理解不同视频片段之间的因果关系。
- 样本数量:
- 验证集: 921个样本
- 测试集: 2,143个样本
- 问题类型:
- 顺序问题
- 因果问题
- 目的性问题
- 数据生成:
- 使用自动化管道生成时间推理问题-答案对,减少人工标注需求。
- 训练数据集包含9,695个机器生成的样本,无需人工干预。
- 评估结果:
- 前沿大型语言模型在准确性上仍落后人类14.3%。
- 通过微调可以增强跨时间推理能力。
数据集维护
- 维护承诺: 长期维护数据集以确保其质量。
- 错误反馈: 发现数据集中的错误时,请提交问题ID至问题页面,团队将进行相应修改。
限制与免责声明
- 版权与许可: 强调遵守原始数据源的版权和许可规则,避免使用禁止复制和重新分发的材料。
- 违规通知: 如发现任何可能违反版权或许可规定的数据样本,请通知团队,将及时删除。
联系信息
- Jr-Jen Chen: r12942106@ntu.edu.tw
- Yu-Chiang Frank Wang: ycwang@ntu.edu.tw
引用
@article{chen2024rextime, title={ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos}, author={Chen, Jr-Jen and Liao, Yu-Chien and Lin, Hsi-Che and Yu, Yu-Chu and Chen, Yen-Chun and Wang, Yu-Chiang Frank}, journal={arXiv preprint arXiv:2406.19392}, year={2024} }
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



