mmtb-anonymous/mmtb-media
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/mmtb-anonymous/mmtb-media
下载链接
链接失效反馈官方服务:
资源简介:
Multimedia Terminal-Bench (MMTB) 数据集是一个用于评估基于终端的AI代理在多媒体内容理解和持久文件工作流中表现的数据集。它包含视频、音频、图像和文档等多种媒体类型,共分为105个任务。每个任务都是自包含的,包含媒体文件和一个`media.toml`清单。数据集主要用于评估多模态能力的LLM代理,并支持通过四种不同的工具链进行模态消融研究。数据集中的大部分媒体文件是合成的,以确保精确的地面真实控制。此外,数据集还提供了详细的负责任AI文档,包括数据限制和偏见、个人和敏感信息、使用案例和有效性证据、社会影响以及合成内容清单。
The Multimedia Terminal-Bench (MMTB) dataset is designed for evaluating terminal-based AI agents ability to understand and act on multimedia content through persistent file workflows. It includes various media types such as video, audio, images, and documents, organized into 105 tasks. Each task is self-contained, with media files and a `media.toml` manifest. The dataset is primarily used to evaluate multimodal-capable LLM agents and supports modality ablation studies via four different harnesses. Most media files in the dataset are synthetically generated to ensure precise ground-truth control. Additionally, the dataset provides comprehensive Responsible AI documentation, covering data limitations and biases, personal and sensitive information, use cases and validity evidence, social impact, and a synthetic content manifest.
提供机构:
mmtb-anonymous



