LocalDoc/fleurs-azerbaijani-asr
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/LocalDoc/fleurs-azerbaijani-asr
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- az
license: cc-by-4.0
task_categories:
- automatic-speech-recognition
tags:
- azerbaijani
- asr
- speech
- fleurs
- benchmark
pretty_name: FLEURS Azerbaijani ASR Benchmark
---
# FLEURS Azerbaijani ASR Benchmark
Azerbaijani (az_az) subset of [FLEURS](https://huggingface.co/datasets/google/fleurs),
reformatted for ASR benchmarking and fine-tuning.
## Source
Based on **FLEURS** dataset by Google ([Conneau et al., 2022](https://arxiv.org/abs/2205.12446)).
Licensed under **CC-BY-4.0**.
## Structure
| Split | Samples | Duration |
|-------|---------|----------|
| train | 2656 | 9.28h |
| dev | 400 | 1.35h |
| test | 921 | 3.23h |
## Fields
- `audio` — 16kHz mono WAV
- `sentence` — transcription (original casing and punctuation)
- `sentence_normalized` — normalized (lowercase, no punctuation)
- `gender` — male / female
- `duration_seconds` — audio duration
## Usage
```python
from datasets import load_dataset
ds = load_dataset("LocalDoc/fleurs-azerbaijani-asr")
# Benchmark
test = ds["test"]
# Training
train = ds["train"]
```
## Citation
```bibtex
@article{fleurs2022arxiv,
title={FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech},
author={Conneau, Alexis and Ma, Min and Khanuja, Simran and Zhang, Yu and Axelrod, Vera and Dalmia, Siddharth and Riesa, Jason and Rivera, Clara and Bapna, Ankur},
journal={arXiv preprint arXiv:2205.12446},
year={2022},
}
```
提供机构:
LocalDoc



