OLKAVS
收藏arXiv2025-09-30 收录
下载链接:
https://aihub.or.kr/aihubdata/data/view.do?currmenu=115&topmenu=100&aihubdatase=realm&datasetsn=538
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为OLKAVS,是迄今为止最大的视听语音数据集,包含了来自1107位韩国发言人的1150小时转录音频。这些音频在录音棚内录制,拥有九种不同的视角以及多种噪声环境。数据集还包括预定义的训练、验证和评估分割,提供了嘴唇和面部边框坐标,以及JSON格式的转录文本。规模上,该数据集包含了1150小时的音频和5750小时的视频,总计超过28000个句子。其任务旨在支持视听语音识别和唇语阅读研究。
This dataset, named OLKAVS, is the largest audiovisual speech dataset to date, containing 1,150 hours of transcribed audio from 1,107 Korean speakers. All audio was recorded in a studio setting, with nine distinct camera viewpoints and multiple noise environments. The dataset also includes predefined training, validation, and evaluation splits, and provides lip and facial bounding box coordinates alongside transcribed text in JSON format. In terms of scale, the dataset encompasses 1,150 hours of audio and 5,750 hours of video, totaling over 28,000 sentences. It is designed to support research in audiovisual speech recognition and lip reading.
提供机构:
Authors of the paper



