ZLSCompLing/LOD_Claude

Name: ZLSCompLing/LOD_Claude
Creator: ZLSCompLing
Published: 2026-04-16 09:27:00
License: 暂无描述

Hugging Face2026-04-16 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/ZLSCompLing/LOD_Claude

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc0-1.0 task_categories: - automatic-speech-recognition language: - lb tags: - audio - speech - luxembourgish - synthetic size_categories: - 10K<n<100K --- # LOD_Claude Dataset ## Dataset Description LOD_Claude is a Luxembourgish speech dataset containing audio recordings paired with transcriptions. The audio features a synthetic voice named Claude reading example sentences from the LOD (Lëtzebuerger Online Dictionnaire) available at lod.lu. ## Dataset Statistics - **Total samples**: 39,034 - **Training samples**: 37,084 - **Validation samples**: 1,950 - **Language**: Luxembourgish (Lëtzebuergesch) - **Audio format**: WAV files - **Sample rate**: 24,000 Hz ## Dataset Structure Each sample contains: - `audio`: Audio file in WAV format - `text`: Transcription text - `split`: Indicates whether sample is from "train" or "val" set - `filename`: Original filename identifier ## Example ```python from datasets import load_dataset dataset = load_dataset("ZLSCompLing/LOD_Claude") # Access a sample sample = dataset[0] print(f"Text: {sample['text']}") # Audio can be accessed via sample['audio'] ``` ## License This dataset is released under the CC0 license - fully public domain with free use and no attribution required. ## Contact For questions or issues regarding this dataset, please contact the repository maintainers.

提供机构：

ZLSCompLing

5,000+

优质数据集

54 个

任务类型

进入经典数据集