ymoslem/CoVoST2-EN-AR-Text
收藏Hugging Face2024-07-18 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/ymoslem/CoVoST2-EN-AR-Text
下载链接
链接失效反馈官方服务:
资源简介:
这是一个仅包含文本的过滤版本,源自英语-阿拉伯语的CoVoST2-EN-AR数据集。
This is a text-only filtered version of the English-Arabic translation dataset, based on the CoVoST2-EN-AR dataset. The dataset includes two main features: text_en (English text) and text_ar (Arabic text). It is divided into a training set with 269380 samples. The dataset size is between 1K and 10K, and it is licensed under CC0 1.0.
提供机构:
ymoslem
原始信息汇总
数据集概述
基本信息
- 数据集名称: CoVoST2-EN-AR
- 数据集版本: default
- 数据集大小: 43878549 bytes
- 下载大小: 28782159 bytes
- 语言:
- 阿拉伯语 (ar)
- 英语 (en)
- 任务类别: 翻译
- 许可协议: CC0 1.0
数据结构
- 特征:
text_en: 英文文本,数据类型为字符串text_ar: 阿拉伯语文本,数据类型为字符串
- 数据分割:
train: 训练集,包含269380个样本,大小为43878549 bytes
引用
@misc{wang2020covost, title={CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus}, author={Changhan Wang and Anne Wu and Juan Pino}, year={2020}, eprint={2007.10310}, archivePrefix={arXiv}, primaryClass={cs.CL} }



