zhibinlan/UME-sft-train
收藏Hugging Face2025-11-10 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/zhibinlan/UME-sft-train
下载链接
链接失效反馈官方服务:
资源简介:
UME-sft-train数据集基于MMEB-V2训练集构建,使用GLM-4.1V-Thinking模型生成每对查询和目标的链式思维(CoT)解释。为了确保数据质量,通过筛选过程去除了包含连续令牌重复、过长推理痕迹或不符合特定响应格式的样本。最终形成了包含1.46百万个冷启动SFT对的 dataset。
The UME-sft-train dataset is constructed based on the training set of MMEB-V2, using the GLM-4.1V-Thinking model to generate chain-of-thought (CoT) rationales for both the query and the target in each pair. To ensure data quality, samples with extensive contiguous token repetition, excessively long reasoning traces, or responses not conforming to the <think>...</think><answer> format are filtered out, resulting in a final set of 1.46 million cold-start SFT pairs.
提供机构:
zhibinlan



