ZLSCompLing/LOD_Claude
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ZLSCompLing/LOD_Claude
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
task_categories:
- automatic-speech-recognition
language:
- lb
tags:
- audio
- speech
- luxembourgish
- synthetic
size_categories:
- 10K<n<100K
---
# LOD_Claude Dataset
## Dataset Description
LOD_Claude is a Luxembourgish speech dataset containing audio recordings paired with transcriptions. The audio features a synthetic voice named Claude reading example sentences from the LOD (Lëtzebuerger Online Dictionnaire) available at lod.lu.
## Dataset Statistics
- **Total samples**: 39,034
- **Training samples**: 37,084
- **Validation samples**: 1,950
- **Language**: Luxembourgish (Lëtzebuergesch)
- **Audio format**: WAV files
- **Sample rate**: 24,000 Hz
## Dataset Structure
Each sample contains:
- `audio`: Audio file in WAV format
- `text`: Transcription text
- `split`: Indicates whether sample is from "train" or "val" set
- `filename`: Original filename identifier
## Example
```python
from datasets import load_dataset
dataset = load_dataset("ZLSCompLing/LOD_Claude")
# Access a sample
sample = dataset[0]
print(f"Text: {sample['text']}")
# Audio can be accessed via sample['audio']
```
## License
This dataset is released under the CC0 license - fully public domain with free use and no attribution required.
## Contact
For questions or issues regarding this dataset, please contact the repository maintainers.
提供机构:
ZLSCompLing



