CoVoST2-Instructions
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/mesolitica/CoVoST2-Instructions
下载链接
链接失效反馈官方服务:
资源简介:
# CoVoST2 Instruction
Originally from https://huggingface.co/datasets/facebook/covost2, we converted to speech instruction format. We also provide test split.
**We highly recommend to not include test set in training set to prevent contamination. Test set supposely to become a speech translation benchmark**.
## how to prepare the dataset
```bash
huggingface-cli download \
mesolitica/CoVoST2-Instructions \
--include "*.zip" \
--repo-type "dataset" \
--local-dir './'
wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py
python3 unzip.py
```
## Acknowledgement
Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!
# CoVoST2 指令数据集
本数据集源自 huggingface 数据集平台的 facebook/covost2 项目(https://huggingface.co/datasets/facebook/covost2),我们将其转换为语音指令格式,同时提供了测试集划分。
**我们强烈建议切勿将测试集纳入训练集,以避免数据污染。该测试集划分理应作为语音翻译基准数据集使用。**
## 数据集准备流程
bash
huggingface-cli download
mesolitica/CoVoST2-Instructions
--include "*.zip"
--repo-type "dataset"
--local-dir './'
wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py
python3 unzip.py
## 致谢
特别感谢https://www.sns.com.my与英伟达(Nvidia)提供的8张H100计算节点!
提供机构:
maas
创建时间:
2025-10-03



