The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/1188976
下载链接
链接失效反馈官方服务:
资源简介:
Citing the RAVDESS The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated. Academic paper citation Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391. Personal use citation Include a link to this Zenodo page - https://zenodo.org/record/1188976 Commercial Licenses Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com. Contact Information If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com. Example Videos Watch a sample of the RAVDESS speech and song videos. Emotion Classification Users If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page]. Construction and Validation Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391. The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE. Description The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18. Audio-only files Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each): Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440. Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012. Audio-Visual and Video-only files Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads: Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x 24 actors = 2880. Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x 23 actors = 2024. File Summary In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files). File naming convention Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics: Filename identifiers Modality (01 = full-AV, 02 = video-only, 03 = audio-only). Vocal channel (01 = speech, 02 = song). Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised). Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion. Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door"). Repetition (01 = 1st repetition, 02 = 2nd repetition). Actor (01 to 24. Odd numbered actors are male, even numbered actors are female). Filename example: 02-01-06-01-02-01-12.mp4 Video-only (02) Speech (01) Fearful (06) Normal intensity (01) Statement "dogs" (02) 1st Repetition (01) 12th Actor (12) Female, as the actor ID number is even. License information The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0 Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com. Related Data sets RAVDESS Facial Landmark Tracking data set [Zenodo project page].
引用说明:本数据集为瑞尔森语音与歌曲情感视听数据库(Ryerson Audio-Visual Database of Emotional Speech and Song, RAVDESS),其采用知识共享署名许可(Creative Commons Attribution license)协议发布,因此无论以何种形式在研究工作中使用本数据集,均需对RAVDESS进行引用。
学术论文引用格式:Livingstone SR、Russo FA于2018年发表的论文《瑞尔森语音与歌曲情感视听数据库(RAVDESS):面向北美英语的动态多模态面部与发声情感表达集》,刊载于《公共科学图书馆·综合》(PLoS ONE)13(5): e0196391,DOI链接:https://doi.org/10.1371/journal.pone.0196391。
个人作品引用规范:针对机器学习项目、博客文章等个人创作类使用场景,需提供本数据集泽诺多(Zenodo)页面的链接;同时引用前述PLoS ONE论文亦为可接受的方式,我们将深表谢意。
商业许可:RAVDESS的商业授权可通过购买获得。如需了解更多信息,请访问本数据集的授权费用页面,或发送邮件至ravdess@gmail.com与我们联系。
联系方式:若您需要了解RAVDESS的更多信息、购买商业授权,或在下载文件时遇到任何问题,请发送邮件至ravdess@gmail.com与我们取得联系。
示例视频:您可观看RAVDESS的语音与歌曲样本视频。
情绪分类用户指南:若您希望使用机器学习方法结合RAVDESS开展情感表达分类研究,请参阅我们最新发布的RAVDESS面部地标跟踪数据集(泽诺多(Zenodo)项目页面)。
数据集构建与验证:RAVDESS的构建流程及感知验证细节已在上述PLoS ONE论文中详细阐述,链接为:https://doi.org/10.1371/journal.pone.0196391。RAVDESS共包含7356个文件,每个文件均由247名符合北美未接受过训练的成年研究参与者特征的志愿者进行10次情感有效性、强度及真实性评分。另有72名参与者提供了重测数据。研究结果显示,该数据集在情感有效性、评分者间信度及重测内部信度方面均达到较高水平。验证数据已开放获取,可与论文一同从PLoS ONE平台下载。
数据集概述:瑞尔森语音与歌曲情感视听数据库(RAVDESS)共包含7356个文件,总容量为24.8 GB。本数据集招募了24名专业演员(12名女性、12名男性),要求他们以中性北美口音朗读两段语义匹配的语句。语音数据包含平静、快乐、悲伤、愤怒、恐惧、惊讶及厌恶7种情感表达,歌曲数据则包含平静、快乐、悲伤、愤怒及恐惧5种情感。每种情感均设置两种强度等级(正常、强烈),并额外包含中性情感表达。所有数据均提供三种模态格式:纯音频(16bit、48kHz .wav格式)、音视频(720p H.264、AAC 48kHz .mp4格式)及纯视频(无音频)。需注意,演员18号(Actor_18)无歌曲文件。
纯音频文件:所有演员(01-24号)的纯音频文件分为两个独立的压缩包(每个约200 MB):语音音频压缩包(Audio_Speech_Actors_01-24.zip,215 MB)包含1440个文件,计算方式为:每位演员60个试次 × 24位演员 = 1440。歌曲音频压缩包(Audio_Song_Actors_01-24.zip,198 MB)包含1012个文件,计算方式为:每位演员44个试次 × 23位演员 = 1012(注:演员18号无歌曲文件)。
音视频与纯视频文件:视频文件按演员编号(01-24号)分为独立的压缩包(每个约500 MB),并进一步分为语音与歌曲两类:语音视频压缩包(Video_Speech_Actor_01.zip 至 Video_Speech_Actor_24.zip)总计包含2880个文件,计算方式为:每位演员60个试次 × 2种模态(音视频、纯视频) ×24位演员=2880。歌曲视频压缩包(Video_Song_Actor_01.zip 至 Video_Song_Actor_24.zip)总计包含2024个文件,计算方式为:每位演员44个试次 ×2种模态(音视频、纯视频) ×23位演员=2024(注:演员18号无歌曲文件)。
文件总览:RAVDESS数据集总计包含7356个文件(2880+2024+1440+1012)。
文件命名规范:RAVDESS的7356个文件均拥有唯一文件名,文件名由7段数字标识符组成(例如:02-01-06-01-02-01-12.mp4)。各标识符的含义如下:
1. 模态:01=完整音视频,02=纯视频,03=纯音频
2. 发声通道:01=语音,02=歌曲
3. 情感:01=中性,02=平静,03=快乐,04=悲伤,05=愤怒,06=恐惧,07=厌恶,08=惊讶
4. 情感强度:01=正常,02=强烈;注:中性情感无强烈强度等级
5. 语句:01="Kids are talking by the door"(译为“孩子们正在门边聊天”),02="Dogs are sitting by the door"(译为“狗狗正坐在门边”)
6. 重复次数:01=第一次重复,02=第二次重复
7. 演员编号:01至24号,奇数编号为男性演员,偶数编号为女性演员
文件名示例:02-01-06-01-02-01-12.mp4
- 模态:纯视频(02)
- 发声通道:语音(01)
- 情感:恐惧(06)
- 情感强度:正常(01)
- 语句:“狗狗正坐在门边”(对应原语句“Dogs are sitting by the door”,02)
- 重复次数:第一次重复(01)
- 演员编号:12号(偶数编号,故为女性演员)
许可证信息:RAVDESS采用知识共享署名-非商业性使用-相同方式共享4.0国际许可协议(Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License,CC BY-NC-SA 4.0)发布。同时,本数据集也可购买商业授权。如需了解更多信息,请访问授权费用页面,或发送邮件至ravdess@gmail.com与我们联系。
相关数据集:RAVDESS面部地标跟踪数据集(泽诺多(Zenodo)项目页面)。
创建时间:
2023-06-28



