five

DCI-CN

收藏
魔搭社区2026-01-02 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/360zhinao/DCI-CN
下载链接
链接失效反馈
官方服务:
资源简介:
# FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model Code: https://github.com/360CVGroup/FG-CLIP FG-CLIP 2 is the foundation model for fine-grained vision-language understanding in both English and Chinese. Across 29 datasets and 8 diverse tasks, it consistently surpasses recent strong baselines such as SigLIP 2 and MetaCLIP 2, achieving the best reported performance to date in both languages. **[FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model](https://arxiv.org/abs/2510.10921)** </br> Chunyu Xie*, Bin Wang*, Fanjing Kong, Jincheng Li, Dawei Liang, Ji Ao, Dawei Leng†, Yuhui Yin(*Equal Contribution, ✝Corresponding Author) </br> [![arXiv](https://img.shields.io/badge/arXiv-2510.10921-b31b1b.svg)](https://arxiv.org/abs/2510.10921) [![HF-model](https://img.shields.io/badge/Model-Collection🤗-yellow.svg)](https://huggingface.co/collections/qihoo360/fg-clip-2-68ecbf9c548623bb78bc7913) [![HF-data](https://img.shields.io/badge/Benchmark-Collection🤗-yellow.svg)](https://huggingface.co/collections/qihoo360/fg-clip-2-68ecbf9c548623bb78bc7913) [![API+MCP](https://img.shields.io/badge/API/MCP-FG--CLIPv2-green.svg)](https://research.360.cn/sass/index) **[FG-CLIP: Fine-Grained Visual and Textual Alignment](https://arxiv.org/abs/2505.05071)** ([code branch: v1.0](https://github.com/360CVGroup/FG-CLIP/tree/v1.0)) </br> Chunyu Xie*, Bin Wang*, Fanjing Kong, Jincheng Li, Dawei Liang, Gengshen Zhang, Dawei Leng†, Yuhui Yin (*Equal Contribution, ✝Corresponding Author) </br> [![arXiv](https://img.shields.io/badge/arXiv-2505.05071-b31b1b.svg)](https://arxiv.org/abs/2505.05071) [![ICML](https://img.shields.io/badge/ICML-2025-blue.svg)](https://icml.cc/Conferences/2025) [![HF-model](https://img.shields.io/badge/Model-Collection🤗-yellow.svg)](https://huggingface.co/collections/qihoo360/fg-clip-681da45d4acfb65c240a6d08) [![HF-data](https://img.shields.io/badge/Data-FineHARD🤗-yellow.svg)](https://huggingface.co/datasets/qihoo360/FineHARD) [![DeepWiki](https://img.shields.io/badge/DeepWiki-FG--CLIP-blue.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAyCAYAAAAnWDnqAAAAAXNSR0IArs4c6QAAA05JREFUaEPtmUtyEzEQhtWTQyQLHNak2AB7ZnyXZMEjXMGeK/AIi+QuHrMnbChYY7MIh8g01fJoopFb0uhhEqqcbWTp06/uv1saEDv4O3n3dV60RfP947Mm9/SQc0ICFQgzfc4CYZoTPAswgSJCCUJUnAAoRHOAUOcATwbmVLWdGoH//PB8mnKqScAhsD0kYP3j/Yt5LPQe2KvcXmGvRHcDnpxfL2zOYJ1mFwrryWTz0advv1Ut4CJgf5uhDuDj5eUcAUoahrdY/56ebRWeraTjMt/00Sh3UDtjgHtQNHwcRGOC98BJEAEymycmYcWwOprTgcB6VZ5JK5TAJ+fXGLBm3FDAmn6oPPjR4rKCAoJCal2eAiQp2x0vxTPB3ALO2CRkwmDy5WohzBDwSEFKRwPbknEggCPB/imwrycgxX2NzoMCHhPkDwqYMr9tRcP5qNrMZHkVnOjRMWwLCcr8ohBVb1OMjxLwGCvjTikrsBOiA6fNyCrm8V1rP93iVPpwaE+gO0SsWmPiXB+jikdf6SizrT5qKasx5j8ABbHpFTx+vFXp9EnYQmLx02h1QTTrl6eDqxLnGjporxl3NL3agEvXdT0WmEost648sQOYAeJS9Q7bfUVoMGnjo4AZdUMQku50McDcMWcBPvr0SzbTAFDfvJqwLzgxwATnCgnp4wDl6Aa+Ax283gghmj+vj7feE2KBBRMW3FzOpLOADl0Isb5587h/U4gGvkt5v60Z1VLG8BhYjbzRwyQZemwAd6cCR5/XFWLYZRIMpX39AR0tjaGGiGzLVyhse5C9RKC6ai42ppWPKiBagOvaYk8lO7DajerabOZP46Lby5wKjw1HCRx7p9sVMOWGzb/vA1hwiWc6jm3MvQDTogQkiqIhJV0nBQBTU+3okKCFDy9WwferkHjtxib7t3xIUQtHxnIwtx4mpg26/HfwVNVDb4oI9RHmx5WGelRVlrtiw43zboCLaxv46AZeB3IlTkwouebTr1y2NjSpHz68WNFjHvupy3q8TFn3Hos2IAk4Ju5dCo8B3wP7VPr/FGaKiG+T+v+TQqIrOqMTL1VdWV1DdmcbO8KXBz6esmYWYKPwDL5b5FA1a0hwapHiom0r/cKaoqr+27/XcrS5UwSMbQAAAABJRU5ErkJggg==)](https://deepwiki.com/360CVGroup/FG-CLIP) ## Data Preparation To run the inference code for FG-CLIP 2, please follow the following step. ### Step 1: Download the model #### Model Zoo |Models | ViT | Model Weights | Demo | |:-----------|:-----------------------:|:---------------------------------------------------------:|:--------------------------------------------------------:| | FG-CLIP-Base | vit-base-patch16-224 | [🤗Huggingface](https://huggingface.co/qihoo360/fg-clip-base) | [Retrieval](https://huggingface.co/spaces/qihoo360/FG-CLIP-Retrieval-demo) & [Dense Feature](https://huggingface.co/spaces/qihoo360/FG-CLIP-Densefeature-demo) | | FG-CLIP-Large | vit-large-patch14-336 | 🤗[Huggingface](https://huggingface.co/qihoo360/fg-clip-large) | | | FG-CLIP2-Base | vit-base-patch16 | [🤗Huggingface](https://huggingface.co/qihoo360/fg-clip2-base) | [Retrieval](https://huggingface.co/spaces/qihoo360/FG-CLIP2-Retrieval-demo) & [Dense Feature](https://huggingface.co/spaces/qihoo360/FG-CLIP2-Densefeature-demo) | | FG-CLIP2-Large | vit-large-patch16 | [🤗Huggingface](https://huggingface.co/qihoo360/fg-clip2-large) | | | FG-CLIP2-So400m | vit-so400m-patch16 | [🤗Huggingface](https://huggingface.co/qihoo360/fg-clip2-so400m) | | ### Step 2: Prepare DCI-CN Dataset First, pull the dataset from the following link. [🤗DCI-CN](https://huggingface.co/datasets/qihoo360/DCI-CN),After downloading, unzip all compressed files, you will obtain the following file structure: ```none DCI-CN ├── txtfile | ├── image_caption.txt ├── images | ├── sa_1543972.jpg │ ├── sa_1543973.jpg │ ├── sa_1543974.jpg │ ├── ... │ ├── sa_1554261.jpg ``` Benchmarks |Model| BackBone |I2T|T2I| | ---- | ---- |---- |---- | |R2D2|ViT-B/16|25.9|27.3| |Chinese-CLIP|ViT-B/16|30.1|27.9| |SigLIP 2|ViT-B/16|5.0|4.0| |**FG-CLIP 2(ours)**|ViT-B/16|**53.9**|**55.7**| |R2D2|ViT-L/14|35.6|34.2| |Chinese-CLIP|ViT-L/14|31.4|32.7| |SigLIP 2|ViT-L/16|13.6|14.4| |**FG-CLIP 2(ours)**|ViT-L/16|**60.4**|**62.2**| |SigLIP 2|ViT-So/16|13.4|12.0| |MetaCLIP 2|ViT-H/14|53.8|52.1| |**FG-CLIP 2(ours)**|ViT-So/16|**62.7**|**65.1**| ## Citation If you find DCI-CN useful for your research and applications, please cite using this BibTeX: ``` @article{xie2025fg2, title={FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model}, author={Xie, Chunyu and Wang, Bin and Kong, Fanjing and Li, Jincheng and Liang, Dawei and Ao, Ji and Leng, Dawei and Yin, Yuhui}, journal={arXiv preprint arXiv:2510.10921}, year={2025} } ``` ``` @article{xie2025fg, title={FG-CLIP: Fine-Grained Visual and Textual Alignment}, author={Xie, Chunyu and Wang, Bin and Kong, Fanjing and Li, Jincheng and Liang, Dawei and Zhang, Gengshen and Leng, Dawei and Yin, Yuhui}, journal={arXiv preprint arXiv:2505.05071}, year={2025} } ``` ## License This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. The content of this project itself is licensed under the [Apache license 2.0](./LICENSE).

# FG-CLIP 2:双语细粒度视觉-语言对齐模型 代码仓库:https://github.com/360CVGroup/FG-CLIP FG-CLIP 2是一款面向中英双语的细粒度视觉-语言理解基础模型。在29个数据集与8类多样化任务的评测中,该模型持续超越SigLIP 2、MetaCLIP 2等当前主流强基线模型,在双语场景下均达到了目前已公开的最优性能。 **[FG-CLIP 2:双语细粒度视觉-语言对齐模型](https://arxiv.org/abs/2510.10921)** </br> 谢春雨*,王斌*,孔繁静,李金成,梁大伟,敖骥,冷大伟†,尹玉辉(*为共同第一作者,†为通讯作者) </br> [![arXiv](https://img.shields.io/badge/arXiv-2510.10921-b31b1b.svg)](https://arxiv.org/abs/2510.10921) [![Hugging Face 模型集](https://img.shields.io/badge/Model-Collection🤗-yellow.svg)](https://huggingface.co/collections/qihoo360/fg-clip-2-68ecbf9c548623bb78bc7913) [![Hugging Face 评测基准集](https://img.shields.io/badge/Benchmark-Collection🤗-yellow.svg)](https://huggingface.co/collections/qihoo360/fg-clip-2-68ecbf9c548623bb78bc7913) [![API+MCP](https://img.shields.io/badge/API/MCP-FG--CLIPv2-green.svg)](https://research.360.cn/sass/index) **[FG-CLIP:细粒度视觉与文本对齐](https://arxiv.org/abs/2505.05071)**([代码分支:v1.0](https://github.com/360CVGroup/FG-CLIP/tree/v1.0)) </br> 谢春雨*,王斌*,孔繁静,李金成,梁大伟,张耿深,冷大伟†,尹玉辉(*为共同第一作者,†为通讯作者) </br> [![arXiv](https://img.shields.io/badge/arXiv-2505.05071-b31b1b.svg)](https://arxiv.org/abs/2505.05071) [![ICML 2025](https://img.shields.io/badge/ICML-2025-blue.svg)](https://icml.cc/Conferences/2025) [![Hugging Face 模型集](https://img.shields.io/badge/Model-Collection🤗-yellow.svg)](https://huggingface.co/collections/qihoo360/fg-clip-681da45d4acfb65c240a6d08) [![Hugging Face 数据集](https://img.shields.io/badge/Data-FineHARD🤗-yellow.svg)](https://huggingface.co/datasets/qihoo360/FineHARD) [![DeepWiki](https://img.shields.io/badge/DeepWiki-FG--CLIP-blue.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAyCAYAAAAnWDnqAAAAAXNSR0IArs4c6QAAA05JREFUaEPtmUtyEzEQhtWTQyQLHNak2AB7ZnyXZMEjXMGeK/AIi+QuHrMnbChYY7MIh8g01fJoopFb0uhhEqqcbWTp06/uv1saEDv4O3n3dV60RfP947Mm9/SQc0ICFQgzfc4CYZoTPAswgSJCCUJUnAAoRHOAUOcATwbmVLWdGoH//PB8mnKqScAhsD0kYP3j/Yt5LPQe2KvcXmGvRHcDnpxfL2zOYJ1mFwrryWTz0advv1Ut4CJgf5uhDuDj5eUcAUoahrdY/56ebRWeraTjMt/00Sh3UDtjgHtQNHwcRGOC98BJEAEymycmYcWwOprTgcB6VZ5JK5TAJ+fXGLBm3FDAmn6oPPjR4rKCAoJCal2eAiQp2x0vxTPB3ALO2CRkwmDy5WohzBDwSEFKRwPbknEggCPB/imwrycgxX2NzoMCHhPkDwqYMr9tRcP5qNrMZHkVnOjRMWwLCcr8ohBVb1OMjxLwGCvjTikrsBOiA6fNyCrm8V1rP93iVPpwaE+gO0SsWmPiXB+jikdf6SizrT5qKasx5j8ABbHpFTx+vFXp9EnYQmLx02h1QTTrl6eDqxLnGjporxl3NL3agEvXdT0WmEost648sQOYAeJS9Q7bfUVoMGnjo4AZdUMQku50McDcMWcBPvr0SzbTAFDfvJqwLzgxwATnCgnp4wDl6Aa+Ax283gghmj+vj7feE2KBBRMW3FzOpLOADl0Isb5587h/U4gGvkt5v60Z1VLG8BhYjbzRwyQZemwAd6cCR5/XFWLYZRIMpX39AR0tjaGGiGzLVyhse5C9RKC6ai42ppWPKiBagOvaYk8lO7DajerabOZP46Lby5wKjw1HCRx7p9sVMOWGzb/vA1hwiWc6jm3MvQDTogQkiqIhJV0nBQBTU+3okKCFDy9WwferkHjtxib7t3xIUQtHxnIwtx4mpg26/HfwVNVDb4oI9RHmx5WGelRVlrtiw43zboCLaxv46AZeB3IlTkwouebTr1y2NjSpHz68WNFjHvupy3q8TFn3Hos2IAk4Ju5dCo8B3wP7VPr/FGaKiG+T+v+TQqIrOqMTL1VdWV1DdmcbO8KXBz6esmYWYKPwDL5b5FA1a0hwapHiom0r/cKaoqr+27/XcrS5UwSMbQAAAABJRU5ErkJggg==)](https://deepwiki.com/360CVGroup/FG-CLIP) ## 数据准备 若需运行FG-CLIP 2的推理代码,请按照以下步骤操作: ### 步骤1:下载模型 #### 模型库 |模型 | 视觉Transformer(Vision Transformer,ViT)配置 | 模型权重 | 演示示例 | |:-----------|:-----------------------:|:---------------------------------------------------------:|:--------------------------------------------------------:| | FG-CLIP-Base | vit-base-patch16-224 | [🤗Hugging Face](https://huggingface.co/qihoo360/fg-clip-base) | [检索任务](https://huggingface.co/spaces/qihoo360/FG-CLIP-Retrieval-demo) & [稠密特征提取](https://huggingface.co/spaces/qihoo360/FG-CLIP-Densefeature-demo) | | FG-CLIP-Large | vit-large-patch14-336 | 🤗[Hugging Face](https://huggingface.co/qihoo360/fg-clip-large) | | | FG-CLIP2-Base | vit-base-patch16 | [🤗Hugging Face](https://huggingface.co/qihoo360/fg-clip2-base) | [检索任务](https://huggingface.co/spaces/qihoo360/FG-CLIP2-Retrieval-demo) & [稠密特征提取](https://huggingface.co/spaces/qihoo360/FG-CLIP2-Densefeature-demo) | | FG-CLIP2-Large | vit-large-patch16 | [🤗Hugging Face](https://huggingface.co/qihoo360/fg-clip2-large) | | | FG-CLIP2-So400m | vit-so400m-patch16 | [🤗Hugging Face](https://huggingface.co/qihoo360/fg-clip2-so400m) | | ### 步骤2:准备DCI-CN数据集 首先,请从以下链接获取数据集:[🤗DCI-CN](https://huggingface.co/datasets/qihoo360/DCI-CN)。下载完成并解压所有压缩包后,将得到如下文件结构: none DCI-CN ├── txtfile | ├── image_caption.txt ├── images | ├── sa_1543972.jpg │ ├── sa_1543973.jpg │ ├── sa_1543974.jpg │ ├── ... │ ├── sa_1554261.jpg ## 评测基准 |模型| 主干网络 |I2T(图像到文本)|T2I(文本到图像)| | ---- | ---- |---- |---- | |R2D2|ViT-B/16|25.9|27.3| |Chinese-CLIP|ViT-B/16|30.1|27.9| |SigLIP 2|ViT-B/16|5.0|4.0| |**FG-CLIP 2(本文提出方法)**|ViT-B/16|**53.9**|**55.7**| |R2D2|ViT-L/14|35.6|34.2| |Chinese-CLIP|ViT-L/14|31.4|32.7| |SigLIP 2|ViT-L/16|13.6|14.4| |**FG-CLIP 2(本文提出方法)**|ViT-L/16|**60.4**|**62.2**| |SigLIP 2|ViT-So/16|13.4|12.0| |MetaCLIP 2|ViT-H/14|53.8|52.1| |**FG-CLIP 2(本文提出方法)**|ViT-So/16|**62.7**|**65.1**| ## 引用格式 若您的研究或应用中使用了DCI-CN数据集,请采用以下BibTeX格式进行引用: @article{xie2025fg2, title={FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model}, author={Xie, Chunyu and Wang, Bin and Kong, Fanjing and Li, Jincheng and Liang, Dawei and Ao, Ji and Leng, Dawei and Yin, Yuhui}, journal={arXiv preprint arXiv:2510.10921}, year={2025} } @article{xie2025fg, title={FG-CLIP: Fine-Grained Visual and Textual Alignment}, author={Xie, Chunyu and Wang, Bin and Kong, Fanjing and Li, Jincheng and Liang, Dawei and Zhang, Gengshen and Leng, Dawei and Yin, Yuhui}, journal={arXiv preprint arXiv:2505.05071}, year={2025} } ## 许可证 本项目所使用的部分数据集与模型权重需遵循其各自的原始许可证条款,使用者需严格遵守对应原始许可的所有条款与条件。本项目主体内容采用[Apache许可证2.0](./LICENSE)进行授权。
提供机构:
maas
创建时间:
2025-10-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作