five

FineHARD

收藏
魔搭社区2026-01-02 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/360zhinao/FineHARD
下载链接
链接失效反馈
官方服务:
资源简介:
# FG-CLIP: Fine-Grained Visual and Textual Alignment **[FG-CLIP: Fine-Grained Visual and Textual Alignment](https://arxiv.org/abs/2505.05071)** </br> Chunyu Xie*, Bin Wang*, Fanjing Kong, Jincheng Li, Dawei Liang, Gengshen Zhang, Dawei Leng†, Yuhui Yin(*Equal Contribution, ✝Corresponding Author) </br> [![arXiv](https://img.shields.io/badge/arXiv-2505.05071-b31b1b.svg)](https://arxiv.org/abs/2505.05071) [![ICML](https://img.shields.io/badge/ICML-2025-blue.svg)](https://icml.cc/Conferences/2025) [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/360CVGroup/FG-CLIP) <p align="center"> <img src="https://huggingface.co/qihoo360/fg-clip-large/resolve/main/radar_chart_methods.png" width="500" height="440"/> </p> ## Model Framework FG-CLIP’s training proceeds in two stages: the first stage leverages global-level caption-image pairs to achieve initial fine-grained alignment, while the second stage supplements these with additional region-level captions, including detailed region captions and positive/negative region descriptions to further refine the alignment. <p align="center"> <img src="https://huggingface.co/qihoo360/fg-clip-large/resolve/main/fgclip_strc.png" width=80%/> </p> # Data Preparation To run the training code for FG-CLIP, please follow the following step. ### Step 1: Download the model Download the FG-CLIP model from this link. [🤗Vit-L@336px](https://huggingface.co/qihoo360/fg-clip-large) or Download the OpenAI CLIP model from this link. [🤗Vit-L@336px](https://huggingface.co/openai/clip-vit-large-patch14-336) ### Step 2: Prepare FineHARD (Fine-Grained Visual Grounding+Recaption+Hard Negative Dataset) Dataset First, pull the dataset from the following link. [🤗FineHARD](https://huggingface.co/datasets/qihoo360/FineHARD),After downloading, unzip all compressed files, you will obtain the following file structure: ```none FineHARD ├── url2key_jsons | ├── url2key_coyo_image_0.json | ├── ... │ ├── url2key_coyo_image_20.json ├── jsonfiles | ├── 2024-12-06_18-32-53_results_10_218_126_44_1025.json │ ├── 2024-12-06_18-33-17_results_llama70b-shcdt-h100-4gpus-no-2.json │ ├──... ├── coyo_image_0 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00099.parquet ├── coyo_image_1 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00099.parquet ├── ... ├── coyo_image_20 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00050.parquet ├── ... ``` Subsequently, you need to install the `img2dataset` package. You can do this by running the following command: ```bash pip install img2dataset ``` Set the `file_in` parameter in the script (`data/get_data.sh`) according to the download path of the data, and also set the directory where you expect to save the files (`pre_dir`, `dir_save`). Subsequently, execute the following commands. ```bash bash data/get_data.sh ``` Due to the randomness in downloading, the image names corresponding to the URLs do not match the names of the images we are using. Therefore, a conversion is needed. This step requires using the `url2key_jsons/*.json` file included in the FineHARD dataset. Also, you can use the files in `url2key_jsons/*.json` to check the download links of all the images we used. ```bash python -m data.convert_image_name \ --url2key_json FineHARD/url2key_jsons \ --down_file_root data/down-grit-12m/ \ --num_parent_folders 21 \ --num_subfolders_per_parent 100 \ --resave_file_root data/grit-12m/ \ rm -r data/down-grit-12m/ ``` ```none FG-CLIP ├── ... ├── FineHARD | ├── jsonfiles | | ├── 2024-12-06_18-32-53_results_10_218_126_44_1025.json | | ├── 2024-12-06_18-33-17_results_llama70b-shcdt-h100-4gpus-no-2.json | | ├──... | ├── ... ├── data | ├── grit-12m | | ├── coyo_image_0 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00099 | | ├── coyo_image_1 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00099 | | ├── ... | | ├── coyo_image_20 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00050 ├── ... ``` ## Citation If you find FineHARD useful for your research and applications, please cite using this BibTeX: ``` @article{xie2025fg, title={FG-CLIP: Fine-Grained Visual and Textual Alignment}, author={Xie, Chunyu and Wang, Bin and Kong, Fanjing and Li, Jincheng and Liang, Dawei and Zhang, Gengshen and Leng, Dawei and Yin, Yuhui}, journal={arXiv preprint arXiv:2505.05071}, year={2025} } ```

# FG-CLIP:细粒度视觉与文本对齐 **[FG-CLIP:细粒度视觉与文本对齐](https://arxiv.org/abs/2505.05071)** </br> 谢春宇*,王斌*,孔繁静,李金成,梁大伟,张耿申,冷大伟†,尹玉辉(* 共同第一作者,† 通讯作者) </br> [![arXiv](https://img.shields.io/badge/arXiv-2505.05071-b31b1b.svg)](https://arxiv.org/abs/2505.05071) [![ICML](https://img.shields.io/badge/ICML-2025-blue.svg)](https://icml.cc/Conferences/2025) [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/360CVGroup/FG-CLIP) <p align="center"> <img src="https://huggingface.co/qihoo360/fg-clip-large/resolve/main/radar_chart_methods.png" width="500" height="440"/> </p> ## 模型框架 FG-CLIP的训练分为两个阶段:第一阶段利用全局级别的图文对实现初步的细粒度对齐;第二阶段则补充额外的区域级文本描述,包括精细化的区域标题以及正负区域描述,以进一步优化对齐效果。 <p align="center"> <img src="https://huggingface.co/qihoo360/fg-clip-large/resolve/main/fgclip_strc.png" width=80%/> </p> ## 数据准备 若要运行FG-CLIP的训练代码,请遵循以下步骤。 ### 步骤1:下载模型 从以下链接下载FG-CLIP模型:[🤗Vit-L@336px](https://huggingface.co/qihoo360/fg-clip-large);或从以下链接下载OpenAI CLIP模型:[🤗Vit-L@336px](https://huggingface.co/openai/clip-vit-large-patch14-336) ### 步骤2:准备FineHARD(细粒度视觉定位+重描述+难负样本数据集) 首先,从以下链接拉取该数据集:[🤗FineHARD](https://huggingface.co/datasets/qihoo360/FineHARD)。下载完成后解压所有压缩文件,将得到如下文件目录结构: none FineHARD ├── url2key_jsons | ├── url2key_coyo_image_0.json | ├── ... │ ├── url2key_coyo_image_20.json ├── jsonfiles | ├── 2024-12-06_18-32-53_results_10_218_126_44_1025.json │ ├── 2024-12-06_18-33-17_results_llama70b-shcdt-h100-4gpus-no-2.json │ ├──... ├── coyo_image_0 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00099.parquet ├── coyo_image_1 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00099.parquet ├── ... ├── coyo_image_20 | ├── 00000.parquet │ ├── 00001.parquet │ ├── ... │ ├── 00050.parquet ├── ... 随后,需安装`img2dataset`工具包,可通过执行以下命令完成安装: bash pip install img2dataset 根据数据集的下载路径,修改脚本`data/get_data.sh`中的`file_in`参数,并设置期望的文件保存目录(`pre_dir`与`dir_save`)。随后执行以下命令: bash bash data/get_data.sh 由于下载过程存在随机性,URL对应的图像文件名与我们使用的图像文件名并不一致,因此需要进行名称转换。此步骤需使用FineHARD数据集中的`url2key_jsons/*.json`文件,同时也可通过该目录下的JSON文件查看本次实验所用全部图像的下载链接。 bash python -m data.convert_image_name --url2key_json FineHARD/url2key_jsons --down_file_root data/down-grit-12m/ --num_parent_folders 21 --num_subfolders_per_parent 100 --resave_file_root data/grit-12m/ rm -r data/down-grit-12m/ 最终的FG-CLIP项目目录结构如下: none FG-CLIP ├── ... ├── FineHARD | ├── jsonfiles | | ├── 2024-12-06_18-32-53_results_10_218_126_44_1025.json | | ├── 2024-12-06_18-33-17_results_llama70b-shcdt-h100-4gpus-no-2.json | | ├──... | ├── ... ├── data | ├── grit-12m | | ├── coyo_image_0 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00099 | | ├── coyo_image_1 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00099 | | ├── ... | | ├── coyo_image_20 | | | ├──00000 | | | ├──00001 | | | ├──... | | | ├──00050 ├── ... ## 引用 若您的研究与应用中使用了FineHARD数据集,请通过以下BibTeX格式引用: @article{xie2025fg, title={FG-CLIP: Fine-Grained Visual and Textual Alignment}, author={Xie, Chunyu and Wang, Bin and Kong, Fanjing and Li, Jincheng and Liang, Dawei and Zhang, Gengshen and Leng, Dawei and Yin, Yuhui}, journal={arXiv preprint arXiv:2505.05071}, year={2025} }
提供机构:
maas
创建时间:
2025-10-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作