404-not-founds/CoMa_7B_SFT

Name: 404-not-founds/CoMa_7B_SFT
Creator: 404-not-founds
Published: 2026-04-22 13:58:51
License: 暂无描述

Hugging Face2026-04-22 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/404-not-founds/CoMa_7B_SFT

下载链接

链接失效反馈

官方服务：

资源简介：

该存储库包含**CoMa**的监督微调（SFT）数据，如论文《Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding》中所述。**CoMa**（先压缩后匹配）是一种高效的预训练范式，旨在将多模态大型语言模型（MLLMs）转变为高性能的多模态嵌入模型。它引入了一个压缩预训练阶段，作为对比学习的热身阶段，使MLLMs能够用少量数据成为具有竞争力的嵌入模型。

This repository contains the Supervised Fine-Tuning (SFT) data for **CoMa**, as presented in the paper [Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding](https://huggingface.co/papers/2511.08480). **CoMa** (Compression then Matching) is an efficient pre-training paradigm designed to transform Multimodal Large Language Models (MLLMs) into high-performance multimodal embedding models. It introduces a compressed pre-training phase that serves as a warm-up stage for contrastive learning, enabling MLLMs to become competitive embedding models with a small amount of data.

提供机构：

404-not-founds

5,000+

优质数据集

54 个

任务类型

进入经典数据集