ymoslem/Tatoeba-Speech-Irish

Name: ymoslem/Tatoeba-Speech-Irish
Creator: ymoslem
Published: 2024-07-02 05:22:00
License: 暂无描述

Hugging Face2024-07-02 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/ymoslem/Tatoeba-Speech-Irish

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个合成的音频数据集，使用Azure的文本转语音服务生成。数据集包含双语文本，这些文本来自Tatoeba数据集的一部分，共有1,983个文本片段。数据集包含两组音频数据，一组是女性声音（OrlaNeural），另一组是男性声音（ColmNeural）。音频数据总时长约为2小时39分钟，分布在3,966个话语中。数据集的结构包括音频、爱尔兰语文本和英语文本三个特征。数据集的主要用途包括自动语音识别、文本到语音转换和翻译任务。

This dataset is a synthetic audio corpus generated via Azure's Text-to-Speech service. It contains bilingual text segments sourced from a subset of the Tatoeba dataset, with a total of 1,983 text fragments. The dataset includes two sets of audio data: one featuring a female voice (OrlaNeural) and the other a male voice (ColmNeural). The total duration of the audio data amounts to approximately 2 hours and 39 minutes, distributed across 3,966 utterances. The dataset structure encompasses three core features: audio recordings, Irish-language text, and English-language text. Its primary use cases include automatic speech recognition (ASR), text-to-speech (TTS) conversion, and machine translation tasks.

提供机构：

ymoslem

原始信息汇总

数据集概述

数据集特征

audio: 音频数据类型
text_ga: 字符串数据类型，代表盖尔语文本
text_en: 字符串数据类型，代表英语文本

数据集分割

train: 训练集
- 数据量: 306559968.44649196 字节
- 示例数量: 3966

数据集大小

下载大小: 200660391 字节
数据集总大小: 306559968.44649196 字节

数据集配置

config_name: default
data_files:
- split: train
- path: data/train-*

数据集描述

数据集为合成音频，使用Azure文本转语音服务创建。
文本部分来自Tatoeba数据集的双语文本。
包含两种音频数据：女性声音（OrlaNeural）和男性声音（ColmNeural）。

数据集结构

特征: [audio, text_ga, text_en]
行数: 3966

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集