five

404-not-founds/CoMa_7B_SFT

收藏
Hugging Face2026-04-22 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/404-not-founds/CoMa_7B_SFT
下载链接
链接失效反馈
官方服务:
资源简介:
该存储库包含**CoMa**的监督微调(SFT)数据,如论文《Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding》中所述。**CoMa**(先压缩后匹配)是一种高效的预训练范式,旨在将多模态大型语言模型(MLLMs)转变为高性能的多模态嵌入模型。它引入了一个压缩预训练阶段,作为对比学习的热身阶段,使MLLMs能够用少量数据成为具有竞争力的嵌入模型。

This repository contains the Supervised Fine-Tuning (SFT) data for **CoMa**, as presented in the paper [Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding](https://huggingface.co/papers/2511.08480). **CoMa** (Compression then Matching) is an efficient pre-training paradigm designed to transform Multimodal Large Language Models (MLLMs) into high-performance multimodal embedding models. It introduces a compressed pre-training phase that serves as a warm-up stage for contrastive learning, enabling MLLMs to become competitive embedding models with a small amount of data.
提供机构:
404-not-founds
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作