musamagwaza23/NCHLT_Siswati_Speech_Corpus
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/musamagwaza23/NCHLT_Siswati_Speech_Corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-3.0
tags:
- ASR
- NCHLT
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: speaker_id
dtype: string
- name: age
dtype: int64
- name: gender
dtype: string
- name: location
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: duration_seconds
dtype: float64
- name: pdp_score
dtype: float64
- name: transcript
dtype: string
splits:
- name: train
num_bytes: 19343920977
num_examples: 187170
download_size: 28420192388
dataset_size: 19343920977
---
## NCHLT Siswati Speech Corpus
A subset of the NCHLT Speech Corpus containing Siswati (siSwati) audio data, uploaded for use in ASR research and model fine-tuning.
The original data is sourced from the South African Centre for Digital Language Resources (SADiLaR) via sadilar.org. Use of this data is subject to the license conditions on the original product page.
## About
Uploaded by Musa, Electronics Honours student, as part of ASR research on South African languages.
---
## Attribution and Credits
Davel, M., Barnard, E., Badenhorst, J., van Heerden, C., de Waal, A.
NCHLT isiZulu Speech Corpus.
CSIR / North-West University, 2014.
**Reference paper:**
> N.J. de Vries, M.H. Davel, J. Badenhorst, W.D. Basson, F. de Wet, E. Barnard and A. de Waal, "A smartphone-based ASR data collection tool for under-resourced languages", *Speech Communication*, Volume 56, January 2014, pp 119-131.
---
## License
This dataset is distributed under the **Creative Commons Attribution 3.0 Unported (CC BY 3.0)** license.
You are free to use, share, and adapt this dataset for any purpose, including commercial use, as long as you give appropriate credit to the original creators listed above.
Full license text: [https://creativecommons.org/licenses/by/3.0/](https://creativecommons.org/licenses/by/3.0/)
---
提供机构:
musamagwaza23



