BOVText
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/BOVText
下载链接
链接失效反馈官方服务:
资源简介:
我们创建了一个名为双语、开放世界视频文本 (BOVText) 的新大规模基准数据集,这是在各种场景中用于视频文本定位的第一个大规模和多语言基准。所有数据均来自快手和YouTube
BOVText主要有三个特点:
大规模:我们提供超过 1,750,000 帧图像的 2,000 多个视频,比现有最大的视频文本数据集大四倍。
开放场景:BOVText涵盖30+开放类别,场景丰富,如生活vlog、体育新闻、自动驾驶、卡通等。另外,标题文本和场景文本分别标记为两种不同的表示含义。视频。前者代表更多的主题信息,后者是场景信息。
双语:BOVText 提供双语文本注释,促进多元文化的生活和交流。
We introduce a novel large-scale benchmark dataset named Bilingual Open-World Video-Text (BOVText), which is the first large-scale and multilingual benchmark for video-text localization across diverse real-world scenarios. All data in this dataset is sourced from Kuaishou and YouTube.
BOVText has three core characteristics:
1. Large-scale: We provide over 2,000 videos comprising more than 1,750,000 frames, which is four times the size of the largest existing video-text dataset.
2. Open-world scenarios: BOVText covers more than 30 open categories with rich and varied scenarios, including daily vlogs, sports news, autonomous driving, cartoons, and more. Additionally, caption texts and scene texts are annotated with two distinct semantic meanings, where the former represents more thematic information while the latter carries scene-related context.
3. Bilingual: BOVText provides bilingual text annotations to promote cross-cultural life and communication.
提供机构:
OpenDataLab
创建时间:
2022-08-16
搜集汇总
数据集介绍

背景与挑战
背景概述
BOVText是一个大规模、多语言的视频文本定位基准数据集,包含2000多个视频和超过1,750,000帧图像,涵盖30多个开放类别。该数据集提供双语文本注释,支持多元文化交流,适用于各种场景的视频文本定位研究。
以上内容由遇见数据集搜集并总结生成



