five

Improving image-text alignment with an optimal feature sub-space-aware similarity learning framework

收藏
中国科学数据2026-04-17 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.1007/s11432-024-4845-2
下载链接
链接失效反馈
官方服务:
资源简介:
Image-text alignment serves as a fundamental cross-modal research topic to bridge vision and language. Its key challenge lies in accurately measuring the similarity of these two heterogeneous modalities. For visual and textual features, most existing methods leverage cosine or Euclidean distance to measure similarity, where the modality features are directly examined in the whole representation space.However, we discover that partial local dimensions, forming sub-spaces with the potential semantic representation tendency, contain more important semantic measurement information. Thus, we argue that existing methods fail to focus on the finer alignment of critical sub-spaces composed of partial dimensions, leading to limited and inaccurate similarity learning. To address this problem, we propose a novel optimal feature sub-space-aware similarity learning framework (OPEN), which takes a forward step to focus on the sub-space composed of local dimensions within modality representations, enabling more subtle semantic alignment and similarity measurement. Specifically, we first construct hierarchical sub-space-aware patterns for learning similarity, i.e., the sub-space comprised of different sizes of local dimensions. Then, for the optimality of the OPEN, there are two new aspects: (1) optimal sub-space-aware patterns, where we reveal which size-level of local dimensions in the sub-space pattern can achieve the optimal performance gains with maximum probability; (2) optimal combined sub-space-aware patterns, in which we mine the optimal complementarities of different size-level patterns. The proposed OPEN enjoys the merit of plug-and-play, and we extensively experiment with it on typical cross-modal alignment paradigms and datasets. OPEN offers consistent and significant performance improvements across different settings, verifying its superiority for simplicity, generality, and effectiveness.
创建时间:
2026-03-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作