five

CoVoST2-Instructions

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/mesolitica/CoVoST2-Instructions
下载链接
链接失效反馈
官方服务:
资源简介:
# CoVoST2 Instruction Originally from https://huggingface.co/datasets/facebook/covost2, we converted to speech instruction format. We also provide test split. **We highly recommend to not include test set in training set to prevent contamination. Test set supposely to become a speech translation benchmark**. ## how to prepare the dataset ```bash huggingface-cli download \ mesolitica/CoVoST2-Instructions \ --include "*.zip" \ --repo-type "dataset" \ --local-dir './' wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py python3 unzip.py ``` ## Acknowledgement Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!

# CoVoST2 指令数据集 本数据集源自 huggingface 数据集平台的 facebook/covost2 项目(https://huggingface.co/datasets/facebook/covost2),我们将其转换为语音指令格式,同时提供了测试集划分。 **我们强烈建议切勿将测试集纳入训练集,以避免数据污染。该测试集划分理应作为语音翻译基准数据集使用。** ## 数据集准备流程 bash huggingface-cli download mesolitica/CoVoST2-Instructions --include "*.zip" --repo-type "dataset" --local-dir './' wget https://gist.githubusercontent.com/huseinzol05/2e26de4f3b29d99e993b349864ab6c10/raw/9b2251f3ff958770215d70c8d82d311f82791b78/unzip.py python3 unzip.py ## 致谢 特别感谢https://www.sns.com.my与英伟达(Nvidia)提供的8张H100计算节点!
提供机构:
maas
创建时间:
2025-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作