ymoslem/Living-Audio-Irish-GA-EN-MTed

Name: ymoslem/Living-Audio-Irish-GA-EN-MTed
Creator: ymoslem
Published: 2024-04-09 14:50:54
License: 暂无描述

Hugging Face2024-04-09 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/ymoslem/Living-Audio-Irish-GA-EN-MTed

下载链接

链接失效反馈

官方服务：

资源简介：

这是Living Audio Irish语音语料库的一个版本，增加了英语机器翻译。爱尔兰语句通过Google Translation API自动翻译成英语。数据集包含句子、音频和翻译三个特征，主要用于训练。原始数据集可在Kaggle和GitHub上找到，属于Idlak项目的一部分。

提供机构：

ymoslem

原始信息汇总

数据集详情

数据集概述

该数据集是Living Audio Irish speech corpus的增强版本，包含爱尔兰语句子的英语机器翻译。爱尔兰语句子通过Google Translation API自动翻译成英语。

数据集结构

特征

sentence: 字符串类型，表示句子。
audio: 音频类型，采样率为48000 Hz。
translation: 字符串类型，表示翻译。

分割

train: 训练集，包含1121个样本，总大小为356980798.0字节。

配置

default: 默认配置，数据文件路径为data/train-*。

数据集加载

python from datasets import load_dataset

living_audio_dataset = load_dataset("ymoslem/Living-Audio-Irish-GA-EN-MTed", split="train", trust_remote_code=True )

引用

@inproceedings{braude19_interspeech, author={David A. Braude and Matthew P. Aylett and Caoimhín Laoide-Kemp and Simone Ashby and Kristen M. Scott and Brian Ó Raghallaigh and Anna Braudo and Alex Brouwer and Adriana Stan}, title={{All Together Now: The Living Audio Dataset}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={1521--1525}, doi={10.21437/Interspeech.2019-2448} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集