SEACrowd/cvss

Name: SEACrowd/cvss
Creator: SEACrowd
Published: 2024-06-24 13:28:42
License: 暂无描述

Hugging Face2024-06-24 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/SEACrowd/cvss

下载链接

链接失效反馈

官方服务：

资源简介：

CVSS是一个大规模多语言到英语的语音到语音翻译语料库，涵盖了从21种语言到英语的句子级平行语音到语音翻译对。

CVSS is a massively multilingual-to-English speech-to-speech translation corpus, covering sentence-level parallel speech-to-speech translation pairs from 21 languages into English. The dataset supports the task of speech-to-speech translation and is available in Indonesian (ind) and English (eng) languages. The dataset can be loaded using the `datasets` library or the `seacrowd` library. The dataset is licensed under CC-BY 4.0 and has a specific citation for academic use.

提供机构：

SEACrowd

原始信息汇总

数据集概述

名称

Cvss

语言

印度尼西亚语 (ind)
英语 (eng)

任务类别

语音到语音翻译

支持的任务

语音到语音翻译

数据集版本

源版本: 1.0.0
SEACrowd版本: 2024.06.20

数据集许可证

CC-BY 4.0

引用

如果您在使用 Cvss 数据集，请引用以下内容：

@inproceedings{jia2022cvss, title={{CVSS} Corpus and Massively Multilingual Speech-to-Speech Translation}, author={Jia, Ye and Tadmor Ramanovich, Michelle and Wang, Quan and Zen, Heiga}, booktitle={Proceedings of Language Resources and Evaluation Conference (LREC)}, pages={6691--6703}, year={2022} }

@article{lovenia2024seacrowd, title={SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages}, author={Holy Lovenia and Rahmad Mahendra and Salsabil Maulana Akbar and Lester James V. Miranda and Jennifer Santoso and Elyanah Aco and Akhdilah and others}, year={2024}, eprint={2406.10118}, journal={arXiv preprint arXiv: 2406.10118} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集