multi-30K
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/bytedance/1d-tokenizer
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个多模态数据集,其中图像被转换成标记,用于基于指令的机器翻译任务。图像作为翻译任务的指令,每个指令都提供了具体的任务描述。该数据集包含了29,000个训练样本,其任务是进行多模态机器翻译。
This is a multimodal dataset for instruction-based machine translation tasks, where images are converted into tokens and act as the task instructions. Each instruction provides a specific task description. This dataset includes 29,000 training samples for multimodal machine translation.



