Falah/classification_arabic_dialects
收藏Hugging Face2023-07-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Falah/classification_arabic_dialects
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: audio
dtype: audio
- name: label
dtype:
class_label:
names:
'0': Algeria
'1': Egypt
'2': Iraq
'3': Jordan
'4': Morocco
'5': Saudi_Arabia
'6': Sudan
'7': Syria
'8': Tunisia
'9': Yemen
splits:
- name: train
num_bytes: 166407297.0
num_examples: 130
download_size: 158117904
dataset_size: 166407297.0
---
# Classification of Arabic Dialects Audio Dataset
This dataset contains audio samples of various Arabic dialects for the task of classification and recognition. The dataset aims to assist researchers and practitioners in developing models and systems for Arabic spoken language analysis and understanding.
## Dataset Details
- Dataset Name: Classification of Arabic Dialects Audio Dataset
- Dataset URL: [Falah/classification_arabic_dialects](https://huggingface.co/datasets/Falah/classification_arabic_dialects)
- Dataset Size: 166,407,297 bytes
- Download Size: 158,117,904 bytes
- Splits:
- Train: 130 examples
## Class Labels and Mapping
The dataset consists of audio samples from the following Arabic dialects, along with their corresponding class labels:
- '0': Algeria
- '1': Egypt
- '2': Iraq
- '3': Jordan
- '4': Morocco
- '5': Saudi Arabia
- '6': Sudan
- '7': Syria
- '8': Tunisia
- '9': Yemen
Please refer to the dataset for the audio samples and their respective class labels.
## Usage Example
To play and display an audio sample from the dataset, you can use the following code:
```python
from IPython.display import Audio
country_names = ['Algeria', 'Egypt', 'Iraq', 'Jordan', 'Morocco', 'Saudi_Arabia', 'Sudan', 'Syria', 'Tunisia', 'Yemen']
index = 0 # Index of the audio example
label = dataset["train"][index]["label"]
country_name = country_names[int(label)]
audio_data = dataset["train"][index]["audio"]["array"]
sampling_rate = dataset["train"][index]["audio"]["sampling_rate"]
# Play audio
display(Audio(audio_data, rate=sampling_rate))
print("Class Label:", label)
print("Country Name:", country_name)
```
Make sure to replace `index` with the desired index of the audio example. This code will play the audio, display it, and print its associated class label and the matched country name from the `country_names` list.
## Applications
The Classification of Arabic Dialects Audio Dataset can be utilized in various applications, including but not limited to:
- Arabic dialect classification
- Arabic spoken language recognition
- Speech analysis and understanding for Arabic dialects
- Acoustic modeling for Arabic dialects
- Cross-dialect speech processing and synthesis
Feel free to explore and leverage this dataset for your research and development tasks related to Arabic spoken language analysis and recognition.
## License
The dataset is made available under the terms of the [Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)](https://creativecommons.org/licenses/by-sa/4.0/) license.
## Citation
If you use this dataset in your research or any other work, please consider citing it as
For more information or inquiries about the dataset, please contact the dataset author(s) mentioned in the citation.
```
@dataset{classification_arabic_dialects,
author = {Falah.G.Salieh},
title = {Classification of Arabic Dialects Audio Dataset},
year = {2023},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/Falah/classification_arabic_dialects},
}
```
提供机构:
Falah
原始信息汇总
阿拉伯方言分类音频数据集
数据集详情
- 数据集名称: 阿拉伯方言分类音频数据集
- 数据集大小: 166,407,297 字节
- 下载大小: 158,117,904 字节
- 拆分:
- 训练集: 130 个样本
类别标签及映射
数据集包含以下阿拉伯方言的音频样本及其对应的类别标签:
- 0: 阿尔及利亚
- 1: 埃及
- 2: 伊拉克
- 3: 约旦
- 4: 摩洛哥
- 5: 沙特阿拉伯
- 6: 苏丹
- 7: 叙利亚
- 8: 突尼斯
- 9: 也门
应用
阿拉伯方言分类音频数据集可用于以下应用:
- 阿拉伯方言分类
- 阿拉伯口语识别
- 阿拉伯方言语音分析和理解
- 阿拉伯方言声学建模
- 跨方言语音处理和合成



