narad/ravdess
收藏Hugging Face2022-11-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/narad/ravdess
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license:
- cc-by-nc-sa-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- audio-classification
task_ids:
- audio-emotion-recognition
---
# Dataset Card for RAVDESS
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:**
https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio
- **Repository:**
- **Paper:**
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196391
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)
Speech audio-only files (16bit, 48kHz .wav) from the RAVDESS. Full dataset of speech and song, audio and video (24.8 GB) available from Zenodo. Construction and perceptual validation of the RAVDESS is described in our Open Access paper in PLoS ONE.
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
English
## Dataset Structure
The dataset repository contains only preprocessing scripts. When loaded and a cached version is not found, the dataset will be automatically downloaded and a .tsv file created with all data instances saved as rows in a table.
### Data Instances
[More Information Needed]
### Data Fields
- "audio": a datasets.Audio representation of the spoken utterance,
- "text": a datasets.Value string representation of spoken utterance,
- "labels": a datasets.ClassLabel representation of the emotion label,
- "speaker_id": a datasets.Value string representation of the speaker ID,
- "speaker_gender": a datasets.Value string representation of the speaker gender
### Data Splits
All data is in the train partition.
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
Original Data from the Zenodo release of the RAVDESS Dataset:
Files
This portion of the RAVDESS contains 1440 files: 60 trials per actor x 24 actors = 1440. The RAVDESS contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech emotions includes calm, happy, sad, angry, fearful, surprise, and disgust expressions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression.
File naming convention
Each of the 1440 files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 03-01-06-01-02-01-12.wav). These identifiers define the stimulus characteristics:
Filename identifiers
Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
Vocal channel (01 = speech, 02 = song).
Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
Repetition (01 = 1st repetition, 02 = 2nd repetition).
Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).
Filename example: 03-01-06-01-02-01-12.wav
Audio-only (03)
Speech (01)
Fearful (06)
Normal intensity (01)
Statement "dogs" (02)
1st Repetition (01)
12th Actor (12)
Female, as the actor ID number is even.
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
(CC BY-NC-SA 4.0)[https://creativecommons.org/licenses/by-nc-sa/4.0/]
### Citation Information
How to cite the RAVDESS
Academic citation
If you use the RAVDESS in an academic publication, please use the following citation: Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
All other attributions
If you use the RAVDESS in a form other than an academic publication, such as in a blog post, school project, or non-commercial product, please use the following attribution: "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo is licensed under CC BY-NA-SC 4.0.
### Contributions
Thanks to [@narad](https://github.com/narad) for adding this dataset.
annotations_creators:
- 无注释
language_creators:
- 现有资源采集
language:
- 英语
license:
- CC BY-NC-SA 4.0
multilinguality:
- 单语言
size_categories:
- 1000 < 数据量 < 10000
source_datasets:
- 原始数据集
task_categories:
- 音频分类
task_ids:
- 音频情感识别
# RAVDESS 数据集卡片
## 目录
- [目录](#table-of-contents)
- [数据集概述](#dataset-description)
- [数据集摘要](#dataset-summary)
- [支持任务与排行榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [遴选依据](#curation-rationale)
- [源数据](#source-data)
- [标注信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差分析](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集管理者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献致谢](#contributions)
## 数据集概述
- **主页:**
https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio
- **代码仓库:**
- **相关论文:**
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196391
- **排行榜:**
- **联系方式:**
### 数据集摘要
瑞尔森情感语音与歌曲音频视觉数据库(Ryerson Audio-Visual Database of Emotional Speech and Song, RAVDESS)
本子集仅包含RAVDESS中的纯语音音频文件(16比特、48kHz采样率的.wav格式文件)。完整的RAVDESS数据集(涵盖语音、歌曲、音频与视频,总大小24.8GB)可在Zenodo平台获取。RAVDESS的构建流程与感知验证方案已发表于我们发表在PLoS ONE的开放获取论文中。
### 支持任务与排行榜
[需补充更多信息]
### 语言
英语
## 数据集结构
本数据集仓库仅包含预处理脚本。当加载数据集且未找到缓存版本时,系统将自动下载数据集,并生成一个.tsv格式文件,将所有数据实例以行的形式存储于该表格中。
### 数据实例
[需补充更多信息]
### 数据字段
- "audio": datasets.Audio类型,存储语音语句的音频表征
- "text": datasets.Value字符串类型,存储语音语句的文本转录结果
- "labels": datasets.ClassLabel类型,存储情感分类标签
- "speaker_id": datasets.Value字符串类型,存储说话人唯一标识
- "speaker_gender": datasets.Value字符串类型,存储说话人性别信息
### 数据划分
所有数据均归属训练集划分。
## 数据集构建
### 遴选依据
[需补充更多信息]
### 源数据
本子集的源数据来自Zenodo平台发布的RAVDESS数据集:
#### 源文件
本RAVDESS子集共包含1440个音频文件:每位演员录制60条语句 × 24位演员 = 1440条。RAVDESS数据集共招募24名专业演员(12名女性、12名男性),他们以中性北美口音朗读两段语义完全匹配的语句。语音情感类别涵盖平静、愉悦、悲伤、愤怒、恐惧、惊讶与厌恶七种。每种情感均以两种强度录制(正常、强烈),其中“中性”情感仅有一种强度版本。
#### 文件命名规范
1440个文件均拥有唯一文件名,文件名由7段数字标识符组成(例如:03-01-06-01-02-01-12.wav),各标识符分别对应音频素材的属性:
##### 文件名标识符含义
- 模态(01=音视频全模态,02=仅视频,03=仅音频)
- 声道类型(01=语音,02=歌曲)
- 情感类别(01=中性,02=平静,03=愉悦,04=悲伤,05=愤怒,06=恐惧,07=厌恶,08=惊讶)
- 情感强度(01=正常,02=强烈)。注意:“中性”情感无强强度版本。
- 语句内容(01="Kids are talking by the door",02="Dogs are sitting by the door")
- 重复录制次数(01=第一次录制,02=第二次录制)
- 演员编号(01至24,奇数编号为男性,偶数编号为女性)
##### 文件名示例:03-01-06-01-02-01-12.wav
- 仅音频(03)
- 语音(01)
- 恐惧情绪(06)
- 正常强度(01)
- 语句为"Dogs are sitting by the door"(02)
- 第一次录制(01)
- 第12位演员(12)
- 演员编号为偶数,故为女性。
#### 初始数据收集与标准化
[需补充更多信息]
#### 源语言发声者信息
[需补充更多信息]
### 标注信息
#### 标注流程
[需补充更多信息]
#### 标注人员信息
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据集使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差分析
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集管理者
[需补充更多信息]
### 许可信息
(CC BY-NC-SA 4.0),详见:https://creativecommons.org/licenses/by-nc-sa/4.0/
### 引用信息
#### 如何引用RAVDESS数据集
##### 学术引用格式
若在学术出版物中使用RAVDESS数据集,请采用如下引用格式:Livingstone SR, Russo FA (2018) 《瑞尔森情感语音与歌曲音频视觉数据库(RAVDESS):北美英语语境下的动态多模态面部与语音表情集》。PLoS ONE 13(5): e0196391. 链接:https://doi.org/10.1371/journal.pone.0196391.
##### 非学术场景引用说明
若在非学术场景(如博客文章、课程作业或非商业产品)中使用RAVDESS数据集,请标注如下:“瑞尔森情感语音与歌曲音频视觉数据库(RAVDESS)”由Livingstone与Russo创作,采用CC BY-NC-SA 4.0许可协议发布。(注:原文笔误为CC BY-NA-SC,应为CC BY-NC-SA 4.0)
### 贡献致谢
感谢[@narad](https://github.com/narad)贡献本数据集。
提供机构:
narad
原始信息汇总
数据集概述
数据集名称
- 名称: Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)
数据集基本信息
- 语言: 英语
- 许可证: CC BY-NC-SA 4.0
- 多语言性: 单语种
- 大小: 1K<n<10K
- 源数据: 原始数据
- 任务类别: 音频分类
- 任务ID: 音频情绪识别
数据集内容
- 数据实例: 包含音频文件,每个文件具有特定的情感标签、说话者ID和性别。
- 数据字段:
- "audio": 音频表示
- "text": 文本表示
- "labels": 情感标签
- "speaker_id": 说话者ID
- "speaker_gender": 说话者性别
- 数据分割: 所有数据位于训练分区。
数据集创建
- 源数据: 来自Zenodo发布的RAVDESS数据集,包含1440个文件,由24名专业演员(12名女性,12名男性)录制,表达7种情感。
- 文件命名规则: 包含7个部分,分别表示模态、语音通道、情感、情感强度、陈述、重复和演员编号。
许可证信息
- 许可证: CC BY-NC-SA 4.0
引用信息
- 学术引用: Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
- 其他引用: "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo is licensed under CC BY-NA-SC 4.0.
搜集汇总
数据集介绍

构建方式
RAVDESS数据集的构建基于对专业演员进行的一系列情感表达录音,涵盖了多种情感状态和强度,经过严格的文件命名与分类体系,确保了数据的一致性和可追溯性。
特点
该数据集以单语言形式存在,包含英语语音文件,具有多种情感标签,如平静、快乐、悲伤、愤怒、恐惧、惊讶和厌恶等,每种情感都有不同的强度级别,为情感识别研究提供了丰富的样本资源。
使用方法
使用RAVDESS数据集时,用户需先从指定存储库中下载相应的预处理脚本和数据文件,然后可以通过内置的数据字段如音频、文本、标签等,进行情感识别相关的机器学习模型的训练和评估。
背景与挑战
背景概述
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) 是由Livingstone和Russo于2018年创建的,旨在为情感语音和歌曲研究提供一个动态的多模态数据集。该数据集包含了24位专业演员以北美英语口音表演的两种语义匹配的陈述,涵盖了从平静到惊讶等多种情感表达,并具有不同的情感强度级别。RAVDESS数据集在学术界产生了广泛影响,为情感识别、语音合成和心理学研究等领域提供了重要的资源。
当前挑战
RAVDESS数据集在构建和应用过程中面临的挑战主要包括:确保情感标签的准确性和一致性,处理音频数据中的噪声和干扰,以及平衡数据集中不同性别、年龄和情感表达的代表性。此外,数据集的单一语言和文化背景限制了其在多语言和多文化环境中的应用,且个人敏感信息的处理和隐私保护也是使用该数据集时需谨慎考虑的问题。
常用场景
经典使用场景
在音频情感识别领域,RAVDESS数据集以其丰富的情感标签和多样的语音样本,成为了一个经典的研究工具。该数据集常被用于训练机器学习模型,以实现对不同情感状态如快乐、悲伤、愤怒等的高精度识别。
衍生相关工作
基于RAVDESS数据集,研究者们衍生出了一系列相关工作,如情感识别算法改进、跨语言情感识别模型的开发以及情感语音合成等领域的探索,极大地推动了情感计算领域的发展。
数据集最近研究
最新研究方向
RAVDESS数据集作为情感语音识别领域的重要资源,近期研究方向主要集中于深度学习模型的创新应用,如卷积神经网络(CNN)与循环神经网络(RNN)的结合,以捕捉语音信号中的复杂情感特征。此外,研究者们正探索跨模态学习,结合语音与面部表情数据,以提升情感识别的准确性。该数据集在情感计算、心理健康评估和交互式语音系统等领域具有重要影响,为相关研究提供了丰富的实验素材。
以上内容由遇见数据集搜集并总结生成



