five

Multi-language Video Subtitle Dataset

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/gj8d88h2g3
下载链接
链接失效反馈
官方服务:
资源简介:
The video subtitle images were collected from 24 videos shared on Facebook and Youtube. The subtitle text included Thai and English languages, including Thai characters, Roman characters, Thai numerals, Arabic numerals, and special characters with 157 characters in total. In the data-preprocessing step, we converted all 24 videos to images and obtained 2,700 images with subtitle text. The size of the subtitle text image was 1280x720 pixels and it was stored in JPG format. Further, we generated the ground truth from 4,224 subtitle images using the labelImg program. Also, the labels were then assigned to each subtitle image. Note that the number before the label is the order of the subtitle text image.

本数据集的视频字幕图像采集自Facebook与Youtube平台上分享的24段视频。字幕文本涵盖泰语与英语两种语言,包含泰文字符、罗马字符、泰文数字、阿拉伯数字以及特殊字符,总计157种字符。 在数据预处理阶段,我们将全部24段视频转换为图像,最终得到2700张包含字幕文本的图像。该字幕文本图像的分辨率为1280×720像素,以JPG格式存储。此外,我们借助labelImg工具从4224张字幕图像中生成了真值标注(ground truth),随后为每张字幕图像分配对应标签。需注意,标签前的数字即为该字幕文本图像的序号。
创建时间:
2021-11-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作