Open Large-scale Korean Audio-Visual Speech (OLKAVS) dataset

Name: Open Large-scale Korean Audio-Visual Speech (OLKAVS) dataset
Creator: 西江大学
Published: 2023-01-16 19:40:50
License: 暂无描述

arXiv2023-01-16 更新2024-06-21 收录

下载链接：

https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=538

下载链接

链接失效反馈

官方服务：

资源简介：

OLKAVS数据集是由西江大学开发的一个大规模开放的韩语视听语音数据集，是目前公开的最大的视听语音数据集之一。该数据集包含1,150小时的转录音频，来自1,107名韩国演讲者在工作室环境中从九个不同视角录制的视频，涵盖了多种噪声情况。数据集的创建过程包括精心设计脚本、选择演讲者、录制环境和数据后处理等步骤。OLKAVS数据集主要用于韩语语音识别、说话人识别、发音水平分类和口部运动分析等领域的多模态研究，旨在解决现有数据集在语言多样性和视角多样性方面的不足。

The OLKAVS dataset is a large-scale open Korean audio-visual speech dataset developed by Sejong University, and it is currently one of the largest publicly available audio-visual speech datasets worldwide. This dataset contains 1,150 hours of transcribed audio, sourced from videos recorded by 1,107 Korean speakers in studio environments across nine distinct perspectives, and encompasses a wide range of noise scenarios. The development process of the OLKAVS dataset includes steps such as carefully designing scripts, selecting speakers, configuring recording environments, and conducting data post-processing. The OLKAVS dataset is primarily used for multimodal research in fields including Korean speech recognition, speaker recognition, pronunciation proficiency classification, and oral movement analysis, aiming to address the shortcomings of existing datasets in terms of linguistic diversity and perspective diversity.

提供机构：

西江大学

创建时间：

2023-01-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集