ANV-SOT-Sample-1: Sesotho Sample Dataset - Next Voices-ZA (South Africa)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14336303
下载链接
链接失效反馈官方服务:
资源简介:
## Sesotho Sample Dataset - Next Voices-ZA (South Africa) - Multilingual Speech DatasetThis dataset includes **scripted and unscripted speech** across various domains such as agriculture, health, finance, sports, transport, culture, society and general topics. It is primarily designed for automatic speech recognition (ASR).## Folder structure
The dataset is organised hierarchically as follows:
## Folder StructureANV-ZA-SOT-1h/├── sot/ # Folder for Sesotho│ ├── recorder_uuid/ # Contains all audio files│ │ ├── recording-1731053452.wav│ │ ├── ...│ ├── transcripts.csv # Contains transcripts of all audio recordings│ ├── meta.csv # Contains additional metadata├── README.md # Description of the dataset
## Data Details
### Audio- Format: **16-bit PCM WAV**- Sample rate: **48kHz**
### Transcriptions- Provided in `transcript.csv` with fields: - `file_name`: Name of the audio file. - `transcript`: Text transcription of the audio. - `duration`: Duration of the recording in seconds. - `type`: Scripted or unscripted.
### Metadata- Provided in `meta.csv` with fields: - `recorder_uuid`: Unique speaker identifier. - `age_range`, - `gender`
## Contact PersonPlease contact vukosi.marivate@cs.up.ac.za if you have any questions## CitationTBA
## FundingFunding for this project was generously made possible through a grant from the Bill & Melinda Gates Foundation and a gift from Meta.
创建时间:
2025-02-26



