five

BEASC: Bangla emotional audio-speech corpus - An speech emotion recognition corpus for the Bangla language

收藏
Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/t9h6p943xy/1
下载链接
链接失效反馈
官方服务:
资源简介:
BEASC is an audio-speech emotion recognition corpus for the Bangla language. The developed dataset consists of voice data from 34 speakers from diverse age groups between 19 to 57 (mean = 28.75 and Standard deviation = 9.346), equally balanced with 17 males and 17 females. This dataset contains 1224 speech-audio data of four emotional states. There are four emotional states recorded for three sentences. The three sentences are i. ‘১২ টা বেজে গেছে,’ ii. ‘আমি জানতাম এমন কিছু হবে’, and iii. ‘এ কেমন উপহার’. These emotional states include four basic human emotions: Angry, Happy, Sad, and Surprise. Three trials were preserved for each emotional expression. Hence, the total number of utterances involves three sentences × three repetitions × four emotions × 34 speakers = 1224 recordings. The format of the audio file is a . WAV format. We consider that happy and sad emotional speech has normal intensity and angry and surprise emotional states have a strong intensity. The data files are divided into 34 individual folders. Each folder contains 36 audio recordings of each participating actor. BEASC is a balanced dataset with 306 recordings of each individual emotion. The size of the BEASC dataset is 619 MB. While most of the existing datasets of different languages are recorded inside a closed studio or cover a single sentence, this dataset is collected by recording through smartphones, hence preserving the slightly noisy real-life environment. BEASC is compatible with various shallow machine learning and deep learning architectures such CNN, LSTM, HMM, Transformer, etc. Each data file has a unique filename. We followed the same procedure as the famous RAVDESS dataset for the naming. The filename consists of seven two-digit numerical identifiers, separated by hyphens (e.g., 03-01-01-01-02-02-02.wav). Each two-digit numerical identifier defines the level of a different experimental factor. The identifiers are ordered: Modality - Statement type - Emotion - Emotion Intensity - Statement - Repetition - Actor.wav. For example, the filename “03-01-01-01-02-02-02.wav” refers to: Audio only (03) - Scripted (01) - Happy (01) - Normal intensity (01) - 2nd Statement (02) - 2nd Repetition (02) - 2nd Actor, Female (02).

BEASC是一款面向孟加拉语的语音情感识别语料库。本数据集的语音数据采集自34名来自不同年龄层的发声者,年龄跨度为19至57岁,平均年龄28.75,标准差9.346,其中男性、女性各17名,性别分布均衡。该数据集共包含1224条语音音频数据,涵盖4种情感状态。针对3个句子分别录制了4种情感,这3个句子分别为:i. ‘১২ টা বেজে গেছে,’ ii. ‘আমি জানতাম এমন কিছু হবে’,以及 iii. ‘এ কেমন উপহার’。本次录制的情感类型涵盖4种基础人类情感:愤怒(Angry)、快乐(Happy)、悲伤(Sad)与惊讶(Surprise)。每种情感表达均保留3次重复录制,因此总语音条数为:3个句子 × 3次重复 × 4种情感 × 34名发声者 = 1224条录音。音频文件格式为WAV格式。我们设定快乐与悲伤的情感语音为正常强度,愤怒与惊讶的情感语音为强强度。所有数据文件被划分为34个独立文件夹,每个文件夹对应一名参与录制的发声者,包含36条音频录音。BEASC为均衡分布的数据集,每种情感对应306条录音,数据集总大小为619 MB。与当前多数语种的现有数据集不同(现有数据集多在封闭录音棚内录制,或仅涵盖单一句子),本数据集通过智能手机录制采集,保留了略带杂音的真实生活声学环境。BEASC可兼容多种浅层机器学习与深度学习架构,例如卷积神经网络(Convolutional Neural Network,CNN)、长短期记忆网络(Long Short-Term Memory,LSTM)、隐马尔可夫模型(Hidden Markov Model,HMM)以及Transformer等。每个数据文件均拥有唯一文件名,我们沿用了知名数据集RAVDESS的命名规则。文件名由7组以连字符分隔的两位数字标识符构成(例如:"03-01-01-01-02-02-02.wav")。每组两位数字标识符分别对应一项不同实验因素的层级,标识符的顺序依次为:模态(Modality)- 语句类型(Statement type)- 情感(Emotion)- 情感强度(Emotion Intensity)- 语句编号(Statement)- 重复次数(Repetition)- 发声者编号(Actor).wav。以文件名"03-01-01-01-02-02-02.wav"为例,其对应信息为:仅音频(03)- 脚本式录制(01)- 快乐情感(01)- 正常强度(01)- 第2条语句(02)- 第2次重复录制(02)- 第2号发声者(女性,02)。
提供机构:
Rayhan Ahmed
二维码
社区交流群
二维码
科研交流群
商业服务