Mexican Emotional Speech Database (MESD)

Name: Mexican Emotional Speech Database (MESD)
Creator: Mendeley
Published: 2025-04-01 05:32:13
License: 暂无描述

DataCite Commons2025-04-01 更新2025-04-16 收录

下载链接：

https://data.mendeley.com/datasets/cy34mh68j9

下载链接

链接失效反馈

官方服务：

资源简介：

The Mexican Emotional Speech Database (MESD) provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness affective prosodies with Mexican cultural shaping. The MESD has been uttered by both adult and child non-professional actors: 3 female, 2 male, and 6 child voices are available (female mean age ± SD = 23.33 ± 1.53, male mean age ± SD = 24 ± 1.41, and children mean age ± SD = 9.83 ± 1.17). Words for emotional and neutral utterances come from two corpora: (corpus A) composed of nouns and adjectives that are repeated across emotional prosodies and types of voice (female, male, child), and (corpus B) which consists of words controlled for age-of-acquisition, frequency of use, familiarity, concreteness, valence, arousal, and discrete emotion dimensionality ratings. The audio recordings took place in a professional studio with the following materials: (1) a Sennheiser e835 microphone with a flat frequency response (100 Hz to 10 kHz), (2) a Focusrite Scarlett 2i4 audio interface connected to the microphone with an XLR cable and to the computer, and (3) the digital audio workstation REAPER (Rapid Environment for Audio Production, Engineering, and Recording). Audio files were stored as a sequence of 24-bit with a sample rate of 48000Hz. The amplitude of acoustic waveforms was rescaled between -1 and 1. Two speaker-embedded naturalness-reduced versions were created out of human emotional utterances for female voices from corpus B. Specifically, naturalness was progressively reduced from human voices to level 1 to level 2. In particular, duration and median pitch were edited on stressed syllables to reduce the difference between stressed and unstressed syllables. On whole utterances, F2/F1 and F3/F1 ratios were lowered by editing F2 and F3 frequencies. Intensity of harmonics 1 and 4 were also reduced. 24 utterances per emotion are available for each type of voice, corpus, and level of naturalness. They are shared as audio files in WAV format. Please see README for audio files nomenclature explanation. The MESD seems to be the first set of single-word emotional utterances that includes both adult and child voices for the Mexican population. Additionally, the MESD provides naturalness-reduced versions of emotional utterances. Citation M. M. Duville, L. M. Alonso-Valerdi, and D. Ibarra-Zarate, “The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning,” 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, p. 4, 2021. Duville, M.M.; Alonso-Valerdi, L.M.; Ibarra-Zarate, D.I. Mexican Emotional Speech Database Based on Semantic, Frequency, Familiarity, Concreteness, and Cultural Shaping of Affective Prosody. Data 2021, 6, 130. https://doi.org/10.3390/data6120130

提供机构：

Mendeley

创建时间：

2021-07-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集