KBayoud/ASR_TEDx_Tunisie
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/KBayoud/ASR_TEDx_Tunisie
下载链接
链接失效反馈官方服务:
资源简介:
TEDxTN是一个用于代码切换(突尼斯阿拉伯语-英语)的三向语音翻译语料库,主要包含突尼斯阿拉伯语的语音数据。该数据集由Fethi Bougares等人整理,包含16kHz的WAV音频片段、对应的转录文本、原始YouTube视频链接以及音频片段的时间信息(开始时间、结束时间和持续时间)。数据集分为训练集(15703个样本)、验证集(731个样本)和测试集(842个样本),总大小约为2.8GB。该数据集适用于自动语音识别(ASR)和语音翻译任务,特别关注突尼斯阿拉伯语方言的代码切换现象。
TEDxTN is a three-way speech translation corpus for code-switched Tunisian Arabic - English, primarily containing speech data in Tunisian Arabic dialect. The dataset, curated by Fethi Bougares et al., includes 16kHz WAV audio segments, corresponding transcription texts, original YouTube video URLs, and timing information (start time, end time, and duration) for each audio segment. The dataset is divided into train (15,703 samples), validation (731 samples), and test (842 samples) splits, with a total size of approximately 2.8GB. It is designed for automatic speech recognition (ASR) and speech translation tasks, with particular focus on code-switching phenomena in Tunisian Arabic dialect.
提供机构:
KBayoud



