HaHaJun1101/OACIRR
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/HaHaJun1101/OACIRR
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-text-to-image
- feature-extraction
- image-feature-extraction
language:
- en
tags:
- composed-image-retrieval
- object-anchored
- image-retrieval
- vision-language
- multimodal
- cvpr2026
size_categories:
- 100K<n<1M
pretty_name: OACIRR
---
# **🔍 Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval (CVPR 2026)**
[**🌐 Homepage**](https://your-personal-or-project-page.github.io) | [**📖 Paper (arXiv)**](https://arxiv.org) | [**🤗 Model (*AdaFocal* Checkpoints)**](https://huggingface.co/HaHaJun1101/AdaFocal) | [**🐙 Code (GitHub)**](https://github.com/HaHaJun1101) | <a href="#downloading-the-oacirr-dataset" style="color: red;">**🛜 Download Now 👇**</a>
<p align="left">
<img
src="https://huggingface.co/datasets/HaHaJun1101/OACIRR/resolve/main/figures/dataset_benchmark_overview.png"
alt="OACIRR Dataset and Benchmark Overview"
width="66%"
>
</p>
---
## 🔔 News
- **⏳ [Coming Soon]: *AdaFocal* model checkpoints and full Training/Evaluation code will be released!**
- **🔥 [2026-03-25]: The OACIRR Benchmark is officially released and is now available for use!**
- **🎉 [2026-02-21]: Our paper "Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval" has been accepted to CVPR 2026!**
---
## 💡 Dataset Overview
**OACIRR** (**O**bject-**A**nchored **C**omposed **I**mage **R**etrieval on **R**eal-world images) is the first large-scale, multi-domain benchmark tailored for the **Object-Anchored Composed Image Retrieval (OACIR)** task.
Unlike traditional Composed Image Retrieval (CIR), which inherently prioritizes broad semantic matching, **OACIRR** mandates strict **instance-level fidelity**. By anchoring a specific object via a bounding box in the reference image, it requires models to retrieve a target image that semantically satisfies the textual modification while **strictly preserving the identical anchored instance**.
**OACIRR** comprises a **unified training set** of **127K quadruples** covering **2,647 instances**, along with an extensive **evaluation benchmark** containing **33.4K queries** across **1,238 instances** from four diverse domains: **<font color=#990000>Fashion</font>, <font color=#CC3300>Car</font>, <font color=#003399>Product</font>, and <font color=#006633>Landmark</font>.** The benchmark is enriched with over **26.6K curated distractor instances** to form challenging galleries.
**Collectively, OACIRR encompasses 160K+ quadruples, providing both a high-quality foundational dataset and a rigorous, comprehensive benchmark for the OACIR task.**
<p align="left">
<img
src="https://huggingface.co/datasets/HaHaJun1101/OACIRR/resolve/main/figures/data_examples.png"
alt="OACIRR Data Examples"
width="98%"
>
</p>
---
## 📊 Dataset Statistics
To highlight the scale and diversity of the **OACIRR** benchmark, we provide detailed statistical breakdowns of both the training set and the evaluation benchmark across four domains.
<p align="left">
<img
src="https://huggingface.co/datasets/HaHaJun1101/OACIRR/resolve/main/figures/instance_distribution.png"
alt="OACIRR Instance Distribution"
width="55%"
>
</p>
#### 📈 Statistics of OACIRR Training Dataset
| **Statistic** | **Number** | **Percentage** |
| :--- | ---: | ---: |
| **Total Annotated Quadruples** | **127,166** | |
| 👗 Fashion | 12,874 | 10.1% |
| 🚗 Car | 12,728 | 10.0% |
| 🛍️ Product | 75,616 | 59.5% |
| ⛰️ Landmark | 25,948 | 20.4% |
| **Total Unique Images** | **39,495** | |
| 👗 Fashion | 1,034 | 2.6% |
| 🚗 Car | 3,111 | 7.9% |
| 🛍️ Product | 27,531 | 69.7% |
| ⛰️ Landmark | 7,819 | 19.8% |
| **Total Unique Instances** | **2,647** | |
| 👗 Fashion | 80 | 3.0% |
| 🚗 Car | 199 | 7.5% |
| 🛍️ Product | 1,419 | 53.6% |
| ⛰️ Landmark | 949 | 35.9% |
| Maximum Modification Text Length | 30.0 | - |
| Average Modification Text Length | 20.2 | - |
#### 📉 Statistics of OACIRR Evaluation Benchmark
| **Statistic** | **Number** | **Percentage** |
| :--- | ---: | ---: |
| **Total Annotated Quadruples** | **33,449** | |
| 👗 Fashion | 3,606 | 10.8% |
| 🚗 Car | 3,586 | 10.7% |
| 🛍️ Product | 21,046 | 62.9% |
| ⛰️ Landmark | 5,211 | 15.6% |
| **Total Unique Images** | **26,595** | |
| *Quadruple Images* | 15,467 | 58.1% |
| *Distractor Images* | 11,134 | 41.9% |
| 👗 Fashion | 5,077 | 19.1% |
| 🚗 Car | 4,717 | 17.7% |
| 🛍️ Product | 11,801 | 44.4% |
| ⛰️ Landmark | 5,000 | 18.8% |
| **Total Unique Instances** | **4,945** | |
| *Quadruple Instances* | 1,238 | 25.0% |
| *Distractor Instances* | 3,707 | 75.0% |
| 👗 Fashion | 1,683 | 34.0% |
| 🚗 Car | 1,089 | 22.0% |
| 🛍️ Product | 799 | 16.2% |
| ⛰️ Landmark | 1,374 | 27.8% |
| Maximum Modification Text Length | 30.0 | - |
| Average Modification Text Length | 19.4 | - |
---
## ⚙️ Dataset Structure
To flexibly support both joint training and domain-specific evaluation, **OACIRR** is organized into two primary components: `OACIRR-Union` and `OACIRR-Subset`.
To provide a seamless out-of-the-box experience, all images are packaged into compressed `.zip` archives (`train.zip` and `val.zip`). Once unzipped, they naturally form the required directory structure.
**Below is the complete dataset structure:**
```text
OACIRR/
│
├── OACIRR-Union/ # 📌 Joint Training Set (Contains all 4 domains)
│ │
│ ├── oacirr-union/ # Unified annotations
│ │ ├── image_bounding_box/
│ │ │ └── bounding_box.train.json
│ │ ├── image_splits/
│ │ │ └── split.train.json
│ │ └── quadruple_captions/
│ │ └── caption_full.train.json
│ │
│ └── train/ # Training Images
│ ├── fashion/
│ │ └── <class_id>/<img_id>.jpg # Images grouped by instance IDs
│ ├── car/
│ ├── product/
│ └── landmark/
│
└── OACIRR-Subset/ # 📌 Domain-specific Subsets (For evaluation & single-domain training)
│
├── OACIRR-Fashion/
│ ├── oacirr-fashion/ # Domain-specific annotations
│ │ ├── image_bounding_box/
│ │ │ ├── bounding_box.train.json
│ │ │ └── bounding_box.val.json
│ │ ├── image_splits/
│ │ │ ├── split.train.json
│ │ │ └── split.val.json
│ │ └── quadruple_captions/
│ │ ├── caption_full.train.json
│ │ └── caption_full.val.json
│ │
│ ├── train/ # Training Images
│ │ └── <class_id>/<img_id>.jpg
│ │
│ └── val/ # Validation Images
│ ├── <class_id>/<img_id>.jpg # Ground-truth targets and references
│ └── candidate_expansion/<img_id>.jpg # Hard-negative distractors
│
├── OACIRR-Car/ # (Same structure as OACIRR-Fashion)
├── OACIRR-Product/ # (Same structure as OACIRR-Fashion)
└── OACIRR-Landmark/ # (Same structure as OACIRR-Fashion)
```
---
## 📝 Annotation Format
We provide clean, decoupled JSON annotations to maximize usability and support customized Dataloader logic.
### 1. Quadruple Captions (`caption_full.[split].json`)
**The core OACIR training/evaluation data. Each object defines a complete retrieval quadruple:**
```json[
{
"reference": "fashion-132866",
"target": "fashion-132868",
"modification_text_mllm": "Change from loose pants laid flat indoors to a fitted look outdoors with a blurred background for privacy.",
"image_similarity": 0.755859375,
"object_category": "skirt",
"reference_bounding_box": [51, 168, 309, 467],
"target_bounding_box": [160, 237, 358, 671]
}
```
*( Note: Bounding boxes are formatted as `[x_min, y_min, x_max, y_max]`. In `OACIRR-Union`, image IDs are prefixed with their domain name, e.g., `"fashion-132866"`. In `OACIRR-Subset`, IDs are plain numerics, e.g., `"132866"`.)*
### 2. Image Splits (`split.[split].json`)
**Maps image IDs to their relative file paths, uniformly managing both normal query/target images and hard-negative distractors:**
```json
{
"127479": "./val/10071/127479.jpg",
"085519": "./val/candidate_expansion/085519.jpg"
}
```
### 3. Image Bounding Box (`bounding_box.[split].json`)
**Maps image IDs to their object bounding boxes:**
```json
{
"005603": [58, 235, 467, 570]
}
```
---
## 🚀 How to Use
<a name="downloading-the-oacirr-dataset"></a>
### 1. Downloading the OACIRR Dataset
**Method A: Using Git LFS (⭐️ Recommended)**
Before you begin, ensure that **Git LFS** is installed on your system.
```bash
git lfs install
git clone https://huggingface.co/datasets/HaHaJun1101/OACIRR
```
**Method B: Using Hugging Face Python API**
```python
from huggingface_hub import snapshot_download
# This will download the dataset to your local directory automatically
snapshot_download(repo_id="HaHaJun1101/OACIRR", local_dir="./OACIRR", repo_type="dataset")
```
### 2. Decompressing Images
After downloading the dataset, you need to unzip the image archives. Navigate to the dataset directory in your terminal and run the following commands:
**For Joint Training (OACIRR-Union):**
```bash
cd OACIRR/OACIRR-Union
unzip train.zip
```
**For Domain-Specific Subsets (e.g., Fashion):**
```bash
cd OACIRR/OACIRR-Subset/OACIRR-Fashion
unzip train.zip
unzip val.zip
```
***( ⚠️ Please repeat the extraction commands for the `OACIRR-Car`, `OACIRR-Product`, and `OACIRR-Landmark` directories.)***
### 3. Dataloader and Evaluation Pipeline (Coming Soon)
We are currently polishing the codebase! A dedicated PyTorch `Dataset` & `DataLoader` implementation, along with evaluation scripts will be released in our [**GitHub Repository**](https://github.com/HaHaJun1101).
---
## ✒️ Citation
If you find our dataset, models, or code useful in your research, please consider citing our paper.
---
许可证:CC BY 4.0
任务类别:
- 图像-文本至图像(image-text-to-image)
- 特征提取(feature-extraction)
- 图像特征提取(image-feature-extraction)
语言:
- 英语
tags:
- 组合图像检索(composed-image-retrieval)
- 对象锚定(object-anchored)
- 图像检索(image-retrieval)
- 视觉-语言(vision-language)
- 多模态(multimodal)
- CVPR 2026
size_categories:
- 100K<n<1M
简称:OACIRR
---
# **🔍 超越语义搜索:面向组合图像检索中的参考锚定(CVPR 2026)**
[**🌐 项目主页**](https://your-personal-or-project-page.github.io) | [**📖 论文(arXiv)**](https://arxiv.org) | [**🤗 模型(*AdaFocal* 权重)**](https://huggingface.co/HaHaJun1101/AdaFocal) | [**🐙 代码(GitHub)**](https://github.com/HaHaJun1101) | <a href="#downloading-the-oacirr-dataset" style="color: red;">**🛜 立即下载 👇**</a>
<p align="left">
<img
src="https://huggingface.co/datasets/HaHaJun1101/OACIRR/resolve/main/figures/dataset_benchmark_overview.png"
alt="OACIRR 数据集与基准框架总览"
width="66%"
>
</p>
---
## 🔔 最新动态
- **⏳ [即将上线]:*AdaFocal* 模型权重与完整训练/评估代码即将发布!**
- **🔥 [2026-03-25]:OACIRR 基准框架正式发布,现已开放使用!**
- **🎉 [2026-02-21]:我们的论文《超越语义搜索:面向组合图像检索中的参考锚定》已被 CVPR 2026 收录!**
---
## 💡 数据集总览
**OACIRR**(**O**bject-**A**nchored **C**omposed **I**mage **R**etrieval on **R**eal-world images,即真实场景图像上的对象锚定组合图像检索)是首个针对**对象锚定组合图像检索(Object-Anchored Composed Image Retrieval, OACIR)**任务打造的大规模多领域基准数据集。
与传统组合图像检索(Composed Image Retrieval, CIR)侧重宽泛语义匹配不同,**OACIRR**要求严格的**实例级保真度**:通过参考图像中的边界框(bounding box)锚定特定对象,模型需要检索出既满足文本修改语义,又**严格保留该锚定实例完全一致**的目标图像。
**OACIRR**包含覆盖2647个实例的12.7万个四元组(quadruple)统一训练集,以及来自时尚、汽车、商品、地标四个多样化领域的3.34万个查询的大规模评估基准,涉及1238个实例。该基准还加入了超过2.66万个精心挑选的干扰实例,以构建具有挑战性的检索图库。
**总体而言,OACIRR 包含16万余个四元组,既为OACIR任务提供了高质量的基础数据集,也构建了严谨全面的基准测试框架。**
<p align="left">
<img
src="https://huggingface.co/datasets/HaHaJun1101/OACIRR/resolve/main/figures/data_examples.png"
alt="OACIRR 数据示例"
width="98%"
>
</p>
---
## 📊 数据集统计信息
为直观展示OACIRR基准框架的规模与多样性,我们针对训练集与评估基准在四个领域的分布提供了详细的细分统计。
<p align="left">
<img
src="https://huggingface.co/datasets/HaHaJun1101/OACIRR/resolve/main/figures/instance_distribution.png"
alt="OACIRR 实例分布"
width="55%"
>
</p>
#### 📈 OACIRR 训练集统计
| **统计项** | **数量** | **占比** |
| :--- | ---: | ---: |
| **总标注四元组** | **127,166** | |
| 👗 时尚服饰 | 12,874 | 10.1% |
| 🚗 汽车 | 12,728 | 10.0% |
| 🛍️ 商品 | 75,616 | 59.5% |
| ⛰️ 地标建筑 | 25,948 | 20.4% |
| **总唯一图像数** | **39,495** | |
| 👗 时尚服饰 | 1,034 | 2.6% |
| 🚗 汽车 | 3,111 | 7.9% |
| 🛍️ 商品 | 27,531 | 69.7% |
| ⛰️ 地标建筑 | 7,819 | 19.8% |
| **总唯一实例数** | **2,647** | |
| 👗 时尚服饰 | 80 | 3.0% |
| 🚗 汽车 | 199 | 7.5% |
| 🛍️ 商品 | 1,419 | 53.6% |
| ⛰️ 地标建筑 | 949 | 35.9% |
| 最长修改文本长度 | 30.0 | - |
| 平均修改文本长度 | 20.2 | - |
#### 📉 OACIRR 评估基准统计
| **统计项** | **数量** | **占比** |
| :--- | ---: | ---: |
| **总标注四元组** | **33,449** | |
| 👗 时尚服饰 | 3,606 | 10.8% |
| 🚗 汽车 | 3,586 | 10.7% |
| 🛍️ 商品 | 21,046 | 62.9% |
| ⛰️ 地标建筑 | 5,211 | 15.6% |
| **总唯一图像数** | **26,595** | |
| *四元组图像* | 15,467 | 58.1% |
| *干扰图像* | 11,134 | 41.9% |
| 👗 时尚服饰 | 5,077 | 19.1% |
| 🚗 汽车 | 4,717 | 17.7% |
| 🛍️ 商品 | 11,801 | 44.4% |
| ⛰️ 地标建筑 | 5,000 | 18.8% |
| **总唯一实例数** | **4,945** | |
| *四元组实例* | 1,238 | 25.0% |
| *干扰实例* | 3,707 | 75.0% |
| 👗 时尚服饰 | 1,683 | 34.0% |
| 🚗 汽车 | 1,089 | 22.0% |
| 🛍️ 商品 | 799 | 16.2% |
| ⛰️ 地标建筑 | 1,374 | 27.8% |
| 最长修改文本长度 | 30.0 | - |
| 平均修改文本长度 | 19.4 | - |
---
## ⚙️ 数据集结构
为灵活支持联合训练与领域专属评估,**OACIRR**分为两大核心组件:`OACIRR-Union`与`OACIRR-Subset`。
为提供开箱即用的便捷体验,所有图像均打包为压缩`.zip`归档文件(`train.zip`与`val.zip`)。解压后将自动生成所需的目录结构。
**完整数据集结构如下:**
text
OACIRR/
│
├── OACIRR-Union/ # 📌 联合训练集(包含全部4个领域)
│ │
│ ├── oacirr-union/ # 统一标注文件
│ │ ├── image_bounding_box/
│ │ │ └── bounding_box.train.json
│ │ ├── image_splits/
│ │ │ └── split.train.json
│ │ └── quadruple_captions/
│ │ └── caption_full.train.json
│ │
│ └── train/ # 训练图像
│ ├── fashion/
│ │ └── <class_id>/<img_id>.jpg # 按实例ID分组的图像
│ ├── car/
│ ├── product/
│ └── landmark/
│
└── OACIRR-Subset/ # 📌 领域专属子集(用于评估与单领域训练)
│
├── OACIRR-Fashion/
│ ├── oacirr-fashion/ # 领域专属标注文件
│ │ ├── image_bounding_box/
│ │ │ ├── bounding_box.train.json
│ │ │ └── bounding_box.val.json
│ │ ├── image_splits/
│ │ │ ├── split.train.json
│ │ │ └── split.val.json
│ │ └── quadruple_captions/
│ │ ├── caption_full.train.json
│ │ └── caption_full.val.json
│ │
│ ├── train/ # 训练图像
│ │ └── <class_id>/<img_id>.jpg
│ │
│ └── val/ # 验证图像
│ ├── <class_id>/<img_id>.jpg # 真实目标与参考图像
│ └── candidate_expansion/<img_id>.jpg # 难分负样本干扰图库
│
├── OACIRR-Car/ # (结构与OACIRR-Fashion一致)
├── OACIRR-Product/ # (结构与OACIRR-Fashion一致)
└── OACIRR-Landmark/ # (结构与OACIRR-Fashion一致)
---
## 📝 标注格式
我们提供了简洁解耦的JSON标注文件,以最大化可用性并支持自定义数据加载器逻辑。
### 1. 四元组描述(`caption_full.[split].json`)
**OACIR训练与评估的核心数据。每个对象对应一个完整的检索四元组:**
json[
{
"reference": "fashion-132866",
"target": "fashion-132868",
"modification_text_mllm": "将室内平铺的宽松长裤调整为户外背景模糊的合身穿搭,以保护隐私。",
"image_similarity": 0.755859375,
"object_category": "skirt",
"reference_bounding_box": [51, 168, 309, 467],
"target_bounding_box": [160, 237, 358, 671]
}
*( 注:边界框格式为`[x_min, y_min, x_max, y_max]`。在`OACIRR-Union`中,图像ID带有领域前缀,例如`"fashion-132866"`。在`OACIRR-Subset`中,ID仅为纯数字,例如`"132866"`。)*
### 2. 图像拆分(`split.[split].json`)
**将图像ID映射至其相对文件路径,统一管理查询/目标图像与难分负样本干扰图像:**
json
{
"127479": "./val/10071/127479.jpg",
"085519": "./val/candidate_expansion/085519.jpg"
}
### 3. 图像边界框(`bounding_box.[split].json`)
**将图像ID映射至其对象边界框:**
json
{
"005603": [58, 235, 467, 570]
}
---
## 🚀 使用方法
<a name="downloading-the-oacirr-dataset"></a>
### 1. 下载OACIRR数据集
**方法A:使用Git LFS(⭐️ 推荐)**
请先确保系统已安装**Git LFS**。
bash
git lfs install
git clone https://huggingface.co/datasets/HaHaJun1101/OACIRR
**方法B:使用Hugging Face Python API**
python
from huggingface_hub import snapshot_download
# 该命令将自动将数据集下载至本地目录
snapshot_download(repo_id="HaHaJun1101/OACIRR", local_dir="./OACIRR", repo_type="dataset")
### 2. 解压图像文件
下载数据集后,需解压图像归档文件。在终端中导航至数据集目录并执行以下命令:
**针对联合训练集(OACIRR-Union):**
bash
cd OACIRR/OACIRR-Union
unzip train.zip
**针对领域专属子集(例如时尚服饰):**
bash
cd OACIRR/OACIRR-Subset/OACIRR-Fashion
unzip train.zip
unzip val.zip
***( ⚠️ 请对`OACIRR-Car`、`OACIRR-Product`与`OACIRR-Landmark`目录重复上述解压操作。)***
### 3. 数据加载器与评估流程(即将上线)
我们正在优化代码库!专属的PyTorch `Dataset`与`DataLoader`实现,以及评估脚本将在我们的[**GitHub仓库**](https://github.com/HaHaJun1101)中发布。
---
## ✒️ 引用信息
若您的研究中使用了本数据集、模型或代码,请引用我们的论文。
提供机构:
HaHaJun1101



