Deeply Korean Read Speech Corpus - Audio AI & ML Training Data

Name: Deeply Korean Read Speech Corpus - Audio AI & ML Training Data
Creator: Deeply
License: 暂无描述

Datarade2024-04-19 收录

下载链接：

https://datarade.ai/data-products/deeply-korean-read-speech-corpus-deeply

下载链接

链接失效反馈

官方服务：

资源简介：

□ Recording contents A pair of adults reading scripts containing 3 distinct text sentiments(negative, neutral, positive) with 3 distinct voice sentiments(negative, neutral, positive). (Script: movie reviews(positive, negative), everyday conversation(neutral) □ Recording environments Anechoic Chamber (no reverb), Studio apartment (moderate reverb), Dance studio (high reverb) □ Device iPhone X (iOS), Samsung Galaxy S7 (Android) □ Distance from the source 0.4m, 2.0m, 4.0m □ Volume ~ 290 hours, ~ 190,000 utterances, ~ 107 GB □ Format wav(44100Hz, 16-bit, mono), or h5(16000Hz, 16-bit, mono) □ Language Korean □ Demographics 34 Korean adults, with 26% males and 74% females, and 47% are in 20s, 20.5% in 30s, 17.5% in 40s, 6% in 50s, 9% in 60s. The Read Speech dataset consists of 289.9 hours of audio clips of reading the scripts with 3 text sentiments with 3 voice sentiments recorded at 3 distinct places using 2 different smartphones running under different operating systems. The participants are encouraged to record repetitively in all 3 types of place (anechoic chamber, studio apartment, dance studio), and every recording is conducted systematically at 3 ordinal distances(0.4m, 2.0m, 4.0m) with 2 types of device(iPhone X and Galaxy S7). The type of text sentiments and voice sentiments is categorized as follows:  ‘Negative text sentiment’, ‘neutral text sentiment’, ‘positive text sentiment’ indicates that the contents being vocalized are negative, neutral, and positive respectively. Specifically, for the negative and positive text sentiments, negative/positive movie reviews, containing degradations, criticisms or compliments, were used. And, for the neutral text sentiment, everyday conversations without typical emotions were used. ‘Negative voice sentiment’ indicates that the speaker vocalized the script with a negative tone of voice, for the sake of consistency, we instructed the speakers to vocalize as if they were angry. ‘Neutral voice sentiment’ indicates that the speaker vocalized the script with a neutral tone of voice, with any emotions involved. Finally, ‘positive voice sentiment’ indicates that the speakers vocalized the script with a positive tone of voice, especially as if they were happy. Each type of voice sentiment was vocalized regardless of the content of the script (text sentiment), for example, the speakers were also asked to vocalize the script positively even though the content was negative. The dataset also includes metadata such as a script(speech-to-text aligned), speaker, age, sex, noise, type of place, distance, and device. The impulse responses of each type of place are available upon request.

提供机构：

Deeply

5,000+

优质数据集

54 个

任务类型

进入经典数据集