ANV-SOT-Sample-1: Sesotho Sample Dataset - Next Voices-ZA (South Africa)

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14336303

下载链接

链接失效反馈

官方服务：

资源简介：

## Sesotho Sample Dataset - Next Voices-ZA (South Africa) - Multilingual Speech DatasetThis dataset includes **scripted and unscripted speech** across various domains such as agriculture, health, finance, sports, transport, culture, society and general topics. It is primarily designed for automatic speech recognition (ASR).## Folder structure The dataset is organised hierarchically as follows: ## Folder StructureANV-ZA-SOT-1h/├── sot/ # Folder for Sesotho│ ├── recorder_uuid/ # Contains all audio files│ │ ├── recording-1731053452.wav│ │ ├── ...│ ├── transcripts.csv # Contains transcripts of all audio recordings│ ├── meta.csv # Contains additional metadata├── README.md # Description of the dataset ## Data Details ### Audio- Format: **16-bit PCM WAV**- Sample rate: **48kHz** ### Transcriptions- Provided in `transcript.csv` with fields: - `file_name`: Name of the audio file. - `transcript`: Text transcription of the audio. - `duration`: Duration of the recording in seconds. - `type`: Scripted or unscripted. ### Metadata- Provided in `meta.csv` with fields: - `recorder_uuid`: Unique speaker identifier. - `age_range`, - `gender` ## Contact PersonPlease contact vukosi.marivate@cs.up.ac.za if you have any questions## CitationTBA ## FundingFunding for this project was generously made possible through a grant from the Bill & Melinda Gates Foundation and a gift from Meta.

创建时间：

2025-02-26