FineHARD

Name: FineHARD
Creator: maas
Published: 2026-01-02 16:53:18
License: 暂无描述

魔搭社区2026-01-02 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/360zhinao/FineHARD

下载链接

链接失效反馈

官方服务：

资源简介：

# FG-CLIP: Fine-Grained Visual and Textual Alignment **[FG-CLIP: Fine-Grained Visual and Textual Alignment](https://arxiv.org/abs/2505.05071)** Chunyu Xie*, Bin Wang*, Fanjing Kong, Jincheng Li, Dawei Liang, Gengshen Zhang, Dawei Leng†, Yuhui Yin(*Equal Contribution, ✝Corresponding Author) [![arXiv](https://img.shields.io/badge/arXiv-2505.05071-b31b1b.svg)](https://arxiv.org/abs/2505.05071) [![ICML](https://img.shields.io/badge/ICML-2025-blue.svg)](https://icml.cc/Conferences/2025) [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/360CVGroup/FG-CLIP) <img src="https://huggingface.co/qihoo360/fg-clip-large/resolve/main/radar_chart_methods.png" width="500" height="440"/> ## Model Framework FG-CLIP’s training proceeds in two stages: the first stage leverages global-level caption-image pairs to achieve initial fine-grained alignment, while the second stage supplements these with additional region-level captions, including detailed region captions and positive/negative region descriptions to further refine the alignment. <img src="https://huggingface.co/qihoo360/fg-clip-large/resolve/main/fgclip_strc.png" width=80%/> # Data Preparation To run the training code for FG-CLIP, please follow the following step. ### Step 1: Download the model Download the FG-CLIP model from this link. [🤗Vit-L@336px](https://huggingface.co/qihoo360/fg-clip-large) or Download the OpenAI CLIP model from this link. [🤗Vit-L@336px](https://huggingface.co/openai/clip-vit-large-patch14-336) ### Step 2: Prepare FineHARD (Fine-Grained Visual Grounding+Recaption+Hard Negative Dataset) Dataset First, pull the dataset from the following link. [🤗FineHARD](https://huggingface.co/datasets/qihoo360/FineHARD)，After downloading, unzip all compressed files, you will obtain the following file structure: ```none FineHARD ├── url2key_jsons | ├── url2key_coyo_image_0.json | ├── ... │ ├── url2key_coyo_image_20.json ├── jsonfiles | ├── 2024-12-06_18-32-53_results_10_218_126_44_1025.json │ ├── 2024-12-06_18-33-17_results_llama70b-shcdt-h100-4gpus-no-2.json │ ├──... ├── coyo_image_0 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00099.parquet ├── coyo_image_1 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00099.parquet ├── ... ├── coyo_image_20 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00050.parquet ├── ... ``` Subsequently, you need to install the `img2dataset` package. You can do this by running the following command: ```bash pip install img2dataset ``` Set the `file_in` parameter in the script (`data/get_data.sh`) according to the download path of the data, and also set the directory where you expect to save the files (`pre_dir`, `dir_save`). Subsequently, execute the following commands. ```bash bash data/get_data.sh ``` Due to the randomness in downloading, the image names corresponding to the URLs do not match the names of the images we are using. Therefore, a conversion is needed. This step requires using the `url2key_jsons/*.json` file included in the FineHARD dataset. Also, you can use the files in `url2key_jsons/*.json` to check the download links of all the images we used. ```bash python -m data.convert_image_name \ --url2key_json FineHARD/url2key_jsons \ --down_file_root data/down-grit-12m/ \ --num_parent_folders 21 \ --num_subfolders_per_parent 100 \ --resave_file_root data/grit-12m/ \ rm -r data/down-grit-12m/ ``` ```none FG-CLIP ├── ... ├── FineHARD | ├── jsonfiles | | ├── 2024-12-06_18-32-53_results_10_218_126_44_1025.json | | ├── 2024-12-06_18-33-17_results_llama70b-shcdt-h100-4gpus-no-2.json | | ├──... | ├── ... ├── data | ├── grit-12m | | ├── coyo_image_0 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00099 | | ├── coyo_image_1 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00099 | | ├── ... | | ├── coyo_image_20 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00050 ├── ... ``` ## Citation If you find FineHARD useful for your research and applications, please cite using this BibTeX: ``` @article{xie2025fg, title={FG-CLIP: Fine-Grained Visual and Textual Alignment}, author={Xie, Chunyu and Wang, Bin and Kong, Fanjing and Li, Jincheng and Liang, Dawei and Zhang, Gengshen and Leng, Dawei and Yin, Yuhui}, journal={arXiv preprint arXiv:2505.05071}, year={2025} } ```

# FG-CLIP：细粒度视觉与文本对齐 **[FG-CLIP：细粒度视觉与文本对齐](https://arxiv.org/abs/2505.05071)** 谢春宇*，王斌*，孔繁静，李金成，梁大伟，张耿申，冷大伟†，尹玉辉（* 共同第一作者，† 通讯作者） [![arXiv](https://img.shields.io/badge/arXiv-2505.05071-b31b1b.svg)](https://arxiv.org/abs/2505.05071) [![ICML](https://img.shields.io/badge/ICML-2025-blue.svg)](https://icml.cc/Conferences/2025) [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/360CVGroup/FG-CLIP) <img src="https://huggingface.co/qihoo360/fg-clip-large/resolve/main/radar_chart_methods.png" width="500" height="440"/> ## 模型框架 FG-CLIP的训练分为两个阶段：第一阶段利用全局级别的图文对实现初步的细粒度对齐；第二阶段则补充额外的区域级文本描述，包括精细化的区域标题以及正负区域描述，以进一步优化对齐效果。 <img src="https://huggingface.co/qihoo360/fg-clip-large/resolve/main/fgclip_strc.png" width=80%/> ## 数据准备若要运行FG-CLIP的训练代码，请遵循以下步骤。 ### 步骤1：下载模型从以下链接下载FG-CLIP模型：[🤗Vit-L@336px](https://huggingface.co/qihoo360/fg-clip-large)；或从以下链接下载OpenAI CLIP模型：[🤗Vit-L@336px](https://huggingface.co/openai/clip-vit-large-patch14-336) ### 步骤2：准备FineHARD（细粒度视觉定位+重描述+难负样本数据集）首先，从以下链接拉取该数据集：[🤗FineHARD](https://huggingface.co/datasets/qihoo360/FineHARD)。下载完成后解压所有压缩文件，将得到如下文件目录结构： none FineHARD ├── url2key_jsons | ├── url2key_coyo_image_0.json | ├── ... │ ├── url2key_coyo_image_20.json ├── jsonfiles | ├── 2024-12-06_18-32-53_results_10_218_126_44_1025.json │ ├── 2024-12-06_18-33-17_results_llama70b-shcdt-h100-4gpus-no-2.json │ ├──... ├── coyo_image_0 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00099.parquet ├── coyo_image_1 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00099.parquet ├── ... ├── coyo_image_20 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00050.parquet ├── ... 随后，需安装`img2dataset`工具包，可通过执行以下命令完成安装： bash pip install img2dataset 根据数据集的下载路径，修改脚本`data/get_data.sh`中的`file_in`参数，并设置期望的文件保存目录（`pre_dir`与`dir_save`）。随后执行以下命令： bash bash data/get_data.sh 由于下载过程存在随机性，URL对应的图像文件名与我们使用的图像文件名并不一致，因此需要进行名称转换。此步骤需使用FineHARD数据集中的`url2key_jsons/*.json`文件，同时也可通过该目录下的JSON文件查看本次实验所用全部图像的下载链接。 bash python -m data.convert_image_name --url2key_json FineHARD/url2key_jsons --down_file_root data/down-grit-12m/ --num_parent_folders 21 --num_subfolders_per_parent 100 --resave_file_root data/grit-12m/ rm -r data/down-grit-12m/ 最终的FG-CLIP项目目录结构如下： none FG-CLIP ├── ... ├── FineHARD | ├── jsonfiles | | ├── 2024-12-06_18-32-53_results_10_218_126_44_1025.json | | ├── 2024-12-06_18-33-17_results_llama70b-shcdt-h100-4gpus-no-2.json | | ├──... | ├── ... ├── data | ├── grit-12m | | ├── coyo_image_0 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00099 | | ├── coyo_image_1 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00099 | | ├── ... | | ├── coyo_image_20 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00050 ├── ... ## 引用若您的研究与应用中使用了FineHARD数据集，请通过以下BibTeX格式引用： @article{xie2025fg, title={FG-CLIP: Fine-Grained Visual and Textual Alignment}, author={Xie, Chunyu and Wang, Bin and Kong, Fanjing and Li, Jincheng and Liang, Dawei and Zhang, Gengshen and Leng, Dawei and Yin, Yuhui}, journal={arXiv preprint arXiv:2505.05071}, year={2025} }

提供机构：

maas

创建时间：

2025-10-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集