MUGEN

Name: MUGEN
Creator: OpenDataLab
Published: 2026-05-17 09:30:34
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/MUGEN

下载链接

链接失效反馈

官方服务：

资源简介：

我们展示了使用开源平台游戏CoinRun收集的大规模视频-音频-文本数据集MUGEN。我们进行了大量修改，通过引入音频和启用新的交互来使游戏更加丰富。我们训练了具有不同目标的RL代理来导航游戏并与13个对象和角色进行交互。这使我们能够自动提取各种视频和相关音频的大量集合。我们采样375K个视频剪辑 (每个3.2s)，并从人类注释器收集文本描述。每个视频都有其他注释，这些注释是从游戏引擎中自动提取的，例如每个帧的准确语义图和模板化的文本描述，MUGEN可以帮助在多模式理解和生成的许多任务中进行研究。我们对涉及视频音频文本检索和生成的任务进行了代表性方法的基准测试。MUGEN和增强型游戏引擎都将被发布，作为多式联运研究的游乐场。

We present the large-scale video-audio-text dataset MUGEN, collected using the open-source platform game CoinRun. We have made extensive modifications to enrich the game by incorporating audio tracks and enabling new interactive features. We trained RL agents with diverse objectives to navigate the game and interact with 13 types of objects and characters, which enables us to automatically extract a large corpus of diverse videos and their corresponding audio recordings. We sampled 375K video clips (each lasting 3.2 seconds) and collected textual descriptions from human annotators. Each video is accompanied by additional annotations automatically extracted from the game engine, such as accurate semantic maps for each frame and templated textual descriptions. MUGEN can facilitate research across numerous tasks in multimodal understanding and generation. We conducted benchmark evaluations of representative methods on tasks involving video-audio-text retrieval and generation. Both MUGEN and the enhanced game engine will be released as a playground for multimodal research.

提供机构：

OpenDataLab

创建时间：

2022-11-02

搜集汇总

数据集介绍