404-not-founds/CoMa_7B_SFT
收藏Hugging Face2026-04-22 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/404-not-founds/CoMa_7B_SFT
下载链接
链接失效反馈官方服务:
资源简介:
该存储库包含**CoMa**的监督微调(SFT)数据,如论文《Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding》中所述。**CoMa**(先压缩后匹配)是一种高效的预训练范式,旨在将多模态大型语言模型(MLLMs)转变为高性能的多模态嵌入模型。它引入了一个压缩预训练阶段,作为对比学习的热身阶段,使MLLMs能够用少量数据成为具有竞争力的嵌入模型。
This repository contains the Supervised Fine-Tuning (SFT) data for **CoMa**, as presented in the paper [Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding](https://huggingface.co/papers/2511.08480). **CoMa** (Compression then Matching) is an efficient pre-training paradigm designed to transform Multimodal Large Language Models (MLLMs) into high-performance multimodal embedding models. It introduces a compressed pre-training phase that serves as a warm-up stage for contrastive learning, enabling MLLMs to become competitive embedding models with a small amount of data.
提供机构:
404-not-founds



