five

Word-level Persian Lipreading Dataset

收藏
arXiv2023-04-09 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2304.04068v1
下载链接
链接失效反馈
官方服务:
资源简介:
本研究介绍了名为‘Word-level Persian Lipreading Dataset’的数据集,由伊朗科技大学计算机工程学院创建。该数据集包含244,000个视频,涉及约1,800名演讲者,视频来自波斯语视频流网站Aparat。数据集涵盖多种光照条件和姿态,包含205小时的视频内容,主要用于波斯语的单词级唇读研究。创建过程中,研究者采用了包括视频选择、面部跟踪提取、主动说话者检测等自动化流程。该数据集旨在解决波斯语唇读识别的挑战,特别是在音频信号受干扰的环境中提高识别准确性。

This study introduces a dataset named 'Word-level Persian Lipreading Dataset', which was developed by the School of Computer Engineering, Iran University of Science and Technology. The dataset consists of 244,000 videos involving approximately 1,800 speakers, sourced from the Persian video streaming platform Aparat. It covers diverse lighting conditions and poses, with a total of 205 hours of video content, and is primarily designed for word-level Persian lipreading research. During the dataset construction, the researchers adopted automated workflows including video selection, facial tracking and extraction, active speaker detection and other related processes. This dataset aims to address the challenges of Persian lipreading recognition, especially to improve recognition accuracy in environments where audio signals are disturbed.
提供机构:
伊朗科技大学计算机工程学院
创建时间:
2023-04-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作