LUMINA (Linguistic Unified Multimodal Indonesian Natural Audio-Visual)

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/8fw93k4rny

下载链接

链接失效反馈

官方服务：

资源简介：

LUMINA (Linguistic Unified Multimodal Indonesian Natural Audio-Visual) is a carefully curated constrained dataset designed to support research in the field of speech perception. Spoken exclusively in Indonesian, LUMINA contains high-quality audio-visual recordings featuring 14 native speakers, including 9 males and 5 females. Each speaker contributes approximately 1,000 sentences, resulting in a rich and diverse collection of data. The recorded videos focus on facial recordings, capturing essential visual cues and expressions that accompany speech. This extensive dataset provides a valuable resource for understanding how humans perceive and process spoken language, paving the way for advancements in speech recognition and synthesis technologies. This dataset aligns with the classification known within relevant research as a 'Constrained Audio-Visual Dataset,' which finds significant application in lip reading and speech synthesis . The dataset is stored in two separate folders according to sources, male and female. Inside each folder are audio files (.wav), after undergoing resampling and trimming to achieve a consistent sampling rate of 16000 Hz, and video files (.mp4), which have been compressed using the CRF28 standard and has been cropped to a width of 250 pixels and a height of 150 pixels with the cut point at the center of the mouth. Each file audio and video stored in P<speaker’ number>_S<sentence’ number> naming format for each audio and video file. Also included is an Excel (.xlsx) file containing a list of word combinations out of 2500 used during the Lumina dataset compilation.

创建时间：

2024-02-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集