Mansooreh/sharif-emotional-speech-dataset
收藏Hugging Face2021-10-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Mansooreh/sharif-emotional-speech-dataset
下载链接
链接失效反馈官方服务:
资源简介:
# <a href='https://arxiv.org/pdf/1906.01155.pdf'>ShEMO: a large-scale validated database for Persian speech emotion detection</a><br>
## Abstract
<div align="justify"> This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including <i>anger</i>, <i>fear</i>, <i>happiness</i>, <i>sadness</i> and <i>surprise</i>, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure,
the inter-annotator agreement is 64% which is interpreted as "substantial agreement". We also present benchmark results based on common classification methods in speech emotion detection task. According to the experiments, support vector machine achieves the best results for both gender-independent (58.2%) and gender-dependent models (female=59.4%, male=57.6%). The ShEMO is available for academic purposes free of charge to provide a baseline for further research on Persian emotional speech.
## Download Dataset
To download female utterances (zip file):
```bash
wget -O female.zip "https://www.dropbox.com/s/42okby6c40w3j2x/female.zip?dl=0"
```
To download male utterances (zip file):
```bash
wget -O male.zip "https://www.dropbox.com/s/5ebs8hq1zm0qkp6/male.zip?dl=0"
```
To download labels & transcripts (json file):
```bash
wget https://github.com/pariajm/sharif-emotional-speech-dataset/raw/master/shemo.json
```
## Models Trained or Fine-tuned on ShEMO
Credits to [Mehrdad Farahani](https://github.com/m3hrdadfi/soxan)
- [Speech emotion detection in Persian (fa) using wav2vec 2.0](https://huggingface.co/m3hrdadfi/wav2vec2-xlsr-persian-speech-emotion-recognition)
- [Speech emotion detection in Persian (fa) using HuBERT](https://huggingface.co/m3hrdadfi/hubert-base-persian-speech-emotion-recognition)
- [Speech geneder detection in Persian (fa) using HuBERT](https://huggingface.co/m3hrdadfi/hubert-base-persian-speech-gender-recognition)
- [Automatic speech recognition in Persian (fa) using XLSR-53](https://huggingface.co/m3hrdadfi/wav2vec2-large-xlsr-persian-shemo)
## Overview of ShEMO
Feature | Status
------------- | ----------
**access** | open source
**language** | Persian (fa)
**modality** | speech
**duration** | 3 hours and 25 minutes
**#utterances** | 3000
**#speakers** | 87 (31 females, 56 males)
**#emotions** | 5 basic emotions (anger, fear, happiness, sadness and surprise) and neutral state
**orthographic transcripts** | available
**phonetic transcripts** | available
Read our paper on <a href='https://link.springer.com/article/10.1007/s10579-018-9427-x'>Springer</a> or [arxiv](https://arxiv.org/pdf/1906.01155.pdf)
## Description of Filenames
The characters used in the filenames and their corresponding meaning:
- **A**: angry
- **F**: female speaker (if used at the beginning of the label e.g.`F14A09`) or fearful (if used in the middle of the label e.g. `M02F01`)
- **H** : happy
- **M** : male speaker
- **N** : neutral
- **S** : sad
- **W** : surprised
e.g. `F03S02` **F** means the speaker is **female**, **03** denotes **the speaker code**, **S** refers to the underlying emotion of the utterance which is **sadness**, **02** means this is the **second utterance for this speaker in sad emotion**.
## Data Instances
Here is a sample of data instances:
```json
"F21N37": {
"speaker_id": "F21",
"gender": "female",
"emotion": "neutral",
"transcript": "مگه من به تو نگفته بودم که باید راجع به دورانت سکوت کنی؟",
"ipa": "mӕge mæn be to nægofte budӕm ke bɑyæd rɑdʒeʔ be dorɑnt sokut koni"
}
```
## دادگان گفتار احساسی شریف (شمو)
برای دریافت مقاله <a href='https://arxiv.org/pdf/1906.01155.pdf'>اینجا</a> کلیک کنید
## Citation
If you use this dataset, please cite the following paper:
~~~~
@Article{MohamadNezami2019,
author = {Mohamad Nezami, Omid and Jamshid Lou, Paria and Karami, Mansoureh},
title = {ShEMO: a large-scale validated database for Persian speech emotion detection},
journal = {Language Resources and Evaluation},
year = {2019},
volume = {53},
number = {1},
pages = {1--16},
issn = {1574-0218},
doi = {10.1007/s10579-018-9427-x},
url = {https://doi.org/10.1007/s10579-018-9427-x}
}
~~~~
### Contact
Paria Jamshid Lou <paria.jamshid-lou@hdr.mq.edu.au>
Omid Mohamad Nezami <omid.mohamad-nezami@hdr.mq.edu.au>
提供机构:
Mansooreh
原始信息汇总
数据集概述
基本信息
- 名称: ShEMO: Sharif Emotional Speech Database
- 语言: 波斯语 (fa)
- 类型: 语音数据
- 时长: 3小时25分钟
- 语音条数: 3000条
- 发言人数量: 87人 (31女性, 56男性)
- 情感类别: 5种基本情感(愤怒、恐惧、快乐、悲伤、惊讶)及中性状态
数据集特点
- 访问: 开源
- 模态: 语音
- 情感标注: 由12位标注者进行,使用多数投票确定最终标签,标注一致性为64%(显著一致)
- 文件格式: 语音文件为zip格式,标注和转录为json格式
- 文件命名规则: 使用特定字符表示情感和性别,如
F03S02表示女性发言人的第二条悲伤语音
数据下载
- 女性语音:
wget -O female.zip "https://www.dropbox.com/s/42okby6c40w3j2x/female.zip?dl=0" - 男性语音:
wget -O male.zip "https://www.dropbox.com/s/5ebs8hq1zm0qkp6/male.zip?dl=0" - 标注与转录:
wget https://github.com/pariajm/sharif-emotional-speech-dataset/raw/master/shemo.json
模型与研究
- 已在该数据集上训练或微调的模型包括使用wav2vec 2.0和HuBERT的波斯语情感识别模型
引用信息
- 引用格式:
@Article{MohamadNezami2019,...
联系信息
- Paria Jamshid Lou: paria.jamshid-lou@hdr.mq.edu.au
- Omid Mohamad Nezami: omid.mohamad-nezami@hdr.mq.edu.au



