cvss
收藏魔搭社区2025-12-05 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/google/cvss
下载链接
链接失效反馈官方服务:
资源简介:
# CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
*CVSS* is a massively multilingual-to-English speech-to-speech translation corpus, covering sentence-level parallel speech-to-speech translation pairs from 21 languages into English. CVSS is derived from the [Common Voice](https://commonvoice.mozilla.org/) speech corpus and the [CoVoST 2](https://github.com/facebookresearch/covost) speech-to-text translation corpus. The translation speech in CVSS is synthesized with two state-of-the-art TTS models trained on the [LibriTTS](http://www.openslr.org/60/) corpus.
CVSS includes two versions of spoken translation for all the 21 x-en language pairs from CoVoST 2, with each version providing unique values:
- *CVSS-C*: All the translation speeches are in a single canonical speaker's voice. Despite being synthetic, these speeches are of very high naturalness and cleanness, as well as having a consistent speaking style. These properties ease the modeling of the target speech and enable models to produce high quality translation speech suitable for user-facing applications.
- *CVSS-T*: The translation speeches are in voices transferred from the corresponding source speeches. Each translation pair has similar voices on the two sides despite being in different languages, making this dataset suitable for building models that preserve speakers' voices when translating speech into different languages.
Together with the source speeches originated from Common Voice, they make two multilingual speech-to-speech translation datasets each with about 1,900 hours of speech.
In addition to translation speech, CVSS also provides normalized translation text matching the pronunciation in the translation speech (e.g. on numbers, currencies, acronyms, etc.), which can be used for both model training as well as standardizing evaluation.
Please check out [our paper](https://arxiv.org/abs/2201.03713) for the detailed description of this corpus, as well as the baseline models we trained on both datasets.
# Load the data
The following example loads the translation speech (i.e. target speech) and the normalized translation text (i.e. target text) released in CVSS corpus. You'll need to load the source speech and optionally the source text from [Common Voice v4.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_4_0) separately, and join them by the file names.
```py
from datasets import load_dataset
# Load only ar-en and ja-en language pairs. Omitting the `languages` argument
# would load all the language pairs.
cvss_c = load_dataset('google/cvss', 'cvss_c', languages=['ar', 'ja'])
# Print the structure of the dataset.
print(cvss_c)
```
# License
CVSS is released under the very permissive [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) license.
## Citation
Please cite this paper when referencing the CVSS corpus:
```
@inproceedings{jia2022cvss,
title={{CVSS} Corpus and Massively Multilingual Speech-to-Speech Translation},
author={Jia, Ye and Tadmor Ramanovich, Michelle and Wang, Quan and Zen, Heiga},
booktitle={Proceedings of Language Resources and Evaluation Conference (LREC)},
pages={6691--6703},
year={2022}
}
```
# CVSS:大规模多语言语音转语音翻译语料库
**CVSS**是一款大规模多语言转英语的语音转语音翻译语料库,涵盖了来自21种语言到英语的句级并行语音转语音翻译对。该语料库源自[通用语音(Common Voice)](https://commonvoice.mozilla.org/)语音语料库与[CoVoST 2](https://github.com/facebookresearch/covost)语音转文本翻译语料库。CVSS中的翻译语音由基于[LibriTTS](http://www.openslr.org/60/)语料库训练的两款顶尖文本转语音(Text-to-Speech, TTS)模型合成。
CVSS针对CoVoST 2中的全部21种x-en语言对提供了两种口语翻译版本,各版本均具备独特的应用价值:
- **CVSS-C**:所有翻译语音均采用单一标准发言人的音色。尽管为合成语音,但其自然度与清晰度均极高,且说话风格保持一致。这些特性简化了目标语音的建模流程,使得模型能够生成适用于面向用户应用的高质量翻译语音。
- **CVSS-T**:翻译语音的音色源自对应的源语音。尽管分属不同语言,但每一组翻译对的两侧语音音色均保持相似,因此该数据集适用于构建能够在语音翻译过程中保留发言人音色的模型。
结合源自通用语音的源语音,二者共同构建了两款多语言语音转语音翻译数据集,单数据集语音时长均约为1900小时。
除翻译语音外,CVSS还提供与翻译语音发音(如数字、货币、首字母缩略词等)相匹配的标准化翻译文本,可用于模型训练与标准化评估。
如需了解该语料库的详细说明以及我们在两款数据集上训练的基准模型,请查阅[我们的论文](https://arxiv.org/abs/2201.03713)。
# 数据加载
以下示例将加载CVSS语料库中发布的翻译语音(即目标语音)与标准化翻译文本(即目标文本)。您需要分别从[通用语音v4.0(Common Voice v4.0)](https://huggingface.co/datasets/mozilla-foundation/common_voice_4_0)加载源语音与可选的源文本,并通过文件名完成数据拼接。
py
from datasets import load_dataset
# Load only ar-en and ja-en language pairs. Omitting the `languages` argument
# would load all the language pairs.
cvss_c = load_dataset('google/cvss', 'cvss_c', languages=['ar', 'ja'])
# Print the structure of the dataset.
print(cvss_c)
# 许可证
CVSS采用极为宽松的[知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International, CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)进行发布。
## 引用
如需引用CVSS语料库,请引用以下论文:
@inproceedings{jia2022cvss,
title={{CVSS} Corpus and Massively Multilingual Speech-to-Speech Translation},
author={Jia, Ye and Tadmor Ramanovich, Michelle and Wang, Quan and Zen, Heiga},
booktitle={Proceedings of Language Resources and Evaluation Conference (LREC)},
pages={6691--6703},
year={2022}
}
提供机构:
maas
创建时间:
2025-04-21



