five

ANV-SOT-Sample-1: Sesotho Sample Dataset - Next Voices-ZA (South Africa)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14336303
下载链接
链接失效反馈
官方服务:
资源简介:
## Sesotho Sample Dataset - Next Voices-ZA (South Africa) - Multilingual Speech DatasetThis dataset includes **scripted and unscripted speech** across various domains such as agriculture, health, finance, sports, transport, culture, society and general topics. It is primarily designed for automatic speech recognition (ASR).## Folder structure The dataset is organised hierarchically as follows: ## Folder StructureANV-ZA-SOT-1h/├── sot/                  # Folder for Sesotho│   ├── recorder_uuid/    # Contains all audio files│   │   ├── recording-1731053452.wav│   │   ├── ...│   ├── transcripts.csv    # Contains transcripts of all audio recordings│   ├── meta.csv          # Contains additional metadata├── README.md             # Description of the dataset ## Data Details ### Audio- Format: **16-bit PCM WAV**- Sample rate: **48kHz** ### Transcriptions- Provided in `transcript.csv` with fields:  - `file_name`: Name of the audio file.  - `transcript`: Text transcription of the audio.  - `duration`: Duration of the recording in seconds.  - `type`: Scripted or unscripted. ### Metadata- Provided in `meta.csv` with fields:  - `recorder_uuid`: Unique speaker identifier.  - `age_range`,  - `gender` ## Contact PersonPlease contact vukosi.marivate@cs.up.ac.za if you have any questions## CitationTBA ## FundingFunding for this project was generously made possible through a grant from the Bill & Melinda Gates Foundation and a gift from Meta.
创建时间:
2025-02-26
二维码
社区交流群
二维码
科研交流群
商业服务