swap-uniba/xMMEB-train
收藏Hugging Face2025-03-13 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/swap-uniba/xMMEB-train
下载链接
链接失效反馈官方服务:
资源简介:
xMMEB-train数据集是一个多语言机器翻译数据集,它是基于MMEB-train数据集并通过MADLAD 3B模型进行机器翻译得到的。该数据集包含了四种语言版本:法语、德语、意大利语和西班牙语。每个数据集文件包含原始数据集的前10,000个实例,用于训练和评估多模态嵌入任务。此外,还提供了一个并行打乱的文件,用于平行语料库的训练。
xMMEB-train dataset is a multilingual machine translation dataset, derived from the MMEB-train dataset through machine translation using the MADLAD 3B model. The dataset includes four language versions: French, German, Italian, and Spanish. Each dataset file contains the first 10,000 instances of the original dataset, used for training and evaluation of multimodal embedding tasks. Additionally, a parallel shuffled file is provided for parallel corpus training.
提供机构:
swap-uniba



