syntaxsynth/mmevol-zh-hant
收藏Hugging Face2024-12-01 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/syntaxsynth/mmevol-zh-hant
下载链接
链接失效反馈官方服务:
资源简介:
MMEvol - Translated Chinese Traditional数据集是从Tongyi-ConvAI/MMEvol翻译而来的一个子集,使用Llama-3-Taiwan-70B-Instruct模型将英文翻译成繁体中文。该数据集包含文本生成和图像到文本的任务类别,语言为中文,标签包括繁体中文、视觉理解和多语言。数据集的特征包括id、messages(包含content、index、text、type和role)和images。数据集分为训练集和验证集,分别包含21000和1149个样本。图像来源包括coco、Q-Instruct-DB、clevr等多个数据集。需要注意的是,数据集可能存在一些错误,因为它没有经过人工监督,主要用于对齐(SFT)LLMs以从现有的视觉语言模型输出繁体中文。
A subset translated from English to Traditional Chinese using a specific translation model. The dataset contains text and images, primarily for text generation and image-to-text tasks. The image sources are distributed across multiple datasets. Note that the original images contain only English OCR tasks, but the responses are in Traditional Chinese, which may affect the alignment of vision-language. Additionally, the translation process was unsupervised, so there may be some errors in the dataset.
提供机构:
syntaxsynth



