speedcell4/ec40-nllb
收藏Hugging Face2024-10-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/speedcell4/ec40-nllb
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多语言文本对,可能用于机器翻译或文本对齐任务。数据集分为训练集、开发集、测试集和零样本集,分别包含56094848、75627、80360和1567020个样本。每个样本包含两个文本(text1和text2)及其对应的语言标签(lang1和lang2),以及两个文本的大小(size1和size2)。
This dataset contains multilingual text pairs, likely used for machine translation or text alignment tasks. The dataset is divided into training, development, test, and zero-shot sets, containing 56094848, 75627, 80360, and 1567020 samples respectively. Each sample includes two texts (text1 and text2) with their corresponding language labels (lang1 and lang2), as well as the sizes of the two texts (size1 and size2).
提供机构:
speedcell4



