PHBJT/cml-tts-filtered
收藏Hugging Face2024-10-30 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/PHBJT/cml-tts-filtered
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是CML-TTS的过滤版本,包含荷兰语、德语、法语、意大利语、波兰语、葡萄牙语和西班牙语的音频和文本数据,采样率为24kHz。数据集主要用于文本到语音(TTS)任务,去除了转录不完整或不正确的样本,特别是Levenshtein相似度低于0.9的样本。数据集的使用示例和来源也在README中提及。
The Filtred and CML-TTS dataset is a filtered version of the CML-TTS dataset, a multilingual Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG). The original CML-TTS dataset includes audiobooks sourced from public domain books of Project Gutenberg, read by volunteers from the LibriVox project. The dataset covers Dutch, German, French, Italian, Polish, Portuguese, and Spanish, all recorded at a sampling rate of 24kHz. The filtered version was created to remove problematic samples, specifically those with incomplete or incorrect transcriptions, by removing rows with a Levenshtein similarity ratio below 0.9. The dataset is licensed under CC BY 4.0 and has been used in conjunction with other datasets to train the Parler-TTS Multilingual model.
提供机构:
PHBJT



