five

Marianoleiras/voxpopuli_es-ja

收藏
Hugging Face2024-12-09 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Marianoleiras/voxpopuli_es-ja
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集设计用于自动语音识别(ASR)和翻译任务,能够将西班牙语语音转换为日语文本。它包含高质量的16 kHz采样率的音频录音,配以西班牙语转录(`es`)和其日语翻译(`ja`)。数据集分为训练集、验证集和测试集,分别包含10,081、1,456和1,366个样本。数据集的总下载大小为4.85 GB,总数据集大小为5.66 GB。数据集的构建过程包括使用facebook/voxpopuli数据集作为基础数据集,通过机器翻译模型将西班牙语转录翻译为英语,再翻译为日语,并进行质量过滤以确保翻译质量。

This dataset is designed for automatic speech recognition (ASR) and translation tasks, enabling the conversion of Spanish speech into Japanese text. It consists of high-quality audio recordings sampled at 16 kHz, paired with Spanish transcriptions (`es`) and their Japanese translations (`ja`). The dataset contains the following features: audio, Spanish transcriptions, and Japanese translations. It is divided into train, validation, and test splits with 10,081, 1,456, and 1,366 examples respectively. The dataset size is 5.66 GB, and the download size is 4.85 GB. The processing steps include using the VoxPopuli dataset as a base, translating Spanish transcriptions to English and then to Japanese, and filtering for quality.
提供机构:
Marianoleiras
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作