five

ZLSCompLing/LOD_Claude

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ZLSCompLing/LOD_Claude
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 task_categories: - automatic-speech-recognition language: - lb tags: - audio - speech - luxembourgish - synthetic size_categories: - 10K<n<100K --- # LOD_Claude Dataset ## Dataset Description LOD_Claude is a Luxembourgish speech dataset containing audio recordings paired with transcriptions. The audio features a synthetic voice named Claude reading example sentences from the LOD (Lëtzebuerger Online Dictionnaire) available at lod.lu. ## Dataset Statistics - **Total samples**: 39,034 - **Training samples**: 37,084 - **Validation samples**: 1,950 - **Language**: Luxembourgish (Lëtzebuergesch) - **Audio format**: WAV files - **Sample rate**: 24,000 Hz ## Dataset Structure Each sample contains: - `audio`: Audio file in WAV format - `text`: Transcription text - `split`: Indicates whether sample is from "train" or "val" set - `filename`: Original filename identifier ## Example ```python from datasets import load_dataset dataset = load_dataset("ZLSCompLing/LOD_Claude") # Access a sample sample = dataset[0] print(f"Text: {sample['text']}") # Audio can be accessed via sample['audio'] ``` ## License This dataset is released under the CC0 license - fully public domain with free use and no attribution required. ## Contact For questions or issues regarding this dataset, please contact the repository maintainers.
提供机构:
ZLSCompLing
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作