多模态讲座演示数据集
收藏arXiv2022-08-17 更新2024-06-21 收录
下载链接:
https://github.com/dondongwon/MLPDataset
下载链接
链接失效反馈官方服务:
资源简介:
多模态讲座演示数据集是由卡内基梅隆大学创建的一个大规模基准数据集,旨在测试机器学习模型在教育内容的多模态理解能力。该数据集包含超过9000张幻灯片和180多小时的视频,涵盖计算机科学、牙科、生物学等多个学科。数据集内容包括自然图像、图表、方程式和文本,与演讲者的口语语言对齐。该数据集的创建旨在推动智能助教的发展,解决自动字幕生成和视觉图表合成等教育相关任务。
The Multimodal Lecture Demonstration Dataset is a large-scale benchmark dataset developed by Carnegie Mellon University, aimed at evaluating the multimodal comprehension capabilities of machine learning models for educational content. It contains over 9,000 slides and more than 180 hours of video, spanning multiple disciplines including computer science, dentistry, biology, and others. The dataset includes natural images, charts, mathematical equations, and text, all aligned with the speakers' spoken language. This dataset was created to advance the development of intelligent teaching assistants and tackle education-related tasks such as automatic caption generation and visual chart synthesis.
提供机构:
卡内基梅隆大学
创建时间:
2022-08-17



