oumi-ai/multimodal-open-r1-8192-filtered-mid-ic
收藏Hugging Face2025-07-09 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/oumi-ai/multimodal-open-r1-8192-filtered-mid-ic
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于视觉语言模型训练的数据集,保留了原始数据集的结构,并通过令牌长度和图像质量进行了筛选。数据集包含2085个样本,每个样本都经过特定的预处理,包括使用Qwen/Qwen2.5-7B-Instruct模型进行序列长度为16384的令牌化处理。数据集的特征包括:令牌化的输入序列(input_ids)、序列的注意力掩码(attention_mask)、语言模型标签(labels)、PIL图像对象(images)、原始对话消息(messages)以及处理元数据(metadata)。
This dataset is for vision-language model training, preserving the original structure of the dataset while being filtered by token length and image quality. The dataset contains 2085 samples, each of which has been preprocessed using the Qwen/Qwen2.5-7B-Instruct model with a sequence length of 16384. The features of the dataset include: tokenized input sequences (input_ids), attention masks for the sequences (attention_mask), language modeling labels (labels), PIL Image objects (images), original conversation messages (messages), and processing metadata.
提供机构:
oumi-ai



