cviu-uarkansas/FaiDPO
收藏Hugging Face2026-02-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/cviu-uarkansas/FaiDPO
下载链接
链接失效反馈官方服务:
资源简介:
# [CVPR 2026] $\phi$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models
[](#)
[](https://arxiv.org/abs/2602.22601)
[](https://huggingface.co/datasets/cviu-uarkansas/FaiDPO)
[](https://github.com/uark-cviu/FaiDPO)
> $\phi$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models<br>
> [Thanh-Dat Truong](https://truongthanhdat.github.io/), [Huu-Thien Tran](https://huuthientran.github.io/), [Jackson Cothren](#), [Bhiksha Raj](http://mlsp.cs.cmu.edu/people/bhiksha/), and [Khoa Luu](http://csce.uark.edu/~khoaluu)<br>
> University of Arkansas, Computer Vision and Image Understanding Lab, CVIU<br>
## Abstract
Fairness in Continual Learning for Large Multimodal Models (LMMs) is an emerging yet underexplored challenge, particularly in the presence of imbalanced data distributions that can lead to biased model updates and suboptimal performance across tasks. While recent continual learning studies have made progress in addressing catastrophic forgetting, the problem of fairness caused the imbalanced data remains largely underexplored. This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or $\phi$-DPO) framework for continual learning in LMMs. In particular, we first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals. Then, we identify the limitations of conventional DPO in imbalanced data and present a new $\phi$-DPO loss that explicitly addresses distributional biases. We provide a comprehensive theoretical analysis demonstrating that our approach addresses both forgetting and data imbalance. Additionally, to enable $\phi$-DPO-based continual learning, we construct pairwise preference annotations for existing benchmarks in the context of continual learning. Extensive experiments and ablation studies show the proposed $\phi$-DPO achieves State-of-the-Art performance across multiple benchmarks, outperforming prior continual learning methods of LMMs.
## Data Preparation
The dataset employed in this research is based on [CoIN](https://github.com/zackschen/CoIN) and [MLLM-CL](https://github.com/bjzhb666/MLLM-CL). Please follow the instructions in these amazing repositories to set up and organize the image files to move forward. Next, please download our preference dataset from [HuggingFace](https://huggingface.co/datasets/cviu-uarkansas/FaiDPO). Here is a snapshot from our dataset.
```json
[
{ // from Math subset
"image": "images/MATHV360k/data_images/ScienceQA/images/14471.png",
"question": "<image>\nHint: Please answer the question and provide the final answer at the end.\nQuestion: What is the shape of the baseball bat?",
"chosen": "The answer is Cylindrical",
"rejected": "The answer is Conical"
},
{ // from AD subset
"image": "images/drivelm/stitch/c24317a5c0cb4f5c9fd31740eb1152f2_ef6b50af7346427a9e32a9d96d5d56f5.jpg",
"question": "<image>\nWhat is the relative positioning of the important objects in the current scene? Objects are encoded using <c,CAM,[cx,cy]>, where c is the identifier, CAM indicates the camera where the object\u2019s center point is situated, and x, y represent the horizontal and vertical coordinates of the center point of the 2D bounding box.",
"chosen": "<c2,CAM_FRONT_RIGHT,[861, 417]> is in front of <c1,CAM_BACK,[535, 792]>.",
"rejected": "<c3,CAM_LEFT,[600, 300]> is behind <c1,CAM_BACK,[535, 792]>."
},
// ...
]
```
We recommend the following data organization:
```
DATA_ROOT_DIR
├── DPO-CoIN
│ ├── images/
│ ├── GQA/
│ ├── Grounding/
│ ├── ImageNet/
│ ├── OCRVQA/
│ ├── ScienceQA/
│ ├── TextVQA/
│ ├── VizWiz/
│ └── VQAv2/
└── DPO-MLLM-CL
├── images/
├── AD/
├── APP/
├── Fin/
├── Math/
├── Med/
├── OCR/
├── RS/
├── Sci/
└── VP/
```
## Training and Testing
Our implementation follows the training and testing evaluation scripts of [LLaVA v1.5](https://github.com/haotian-liu/LLaVA).
## Acknowledgments
This work is partly supported by NSF CAREER (No. 2442295), NSF SCH (No. 2501021), NSF E-RISE (No. 2445877), NSF BIO (No. 2524623) and USDA/NIFA Award. We also acknowledge the Arkansas High-Performance Computing Center (HPC) for GPU servers.
## Citation
If you find this code useful for your research, please consider citing:
```
@inproceedings{truong2026faidpo,
title={{$\phi$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models}},
author={Truong, Thanh-Dat and Tran, Huu-Thien and Cothren, Jackson and Raj, Bhiksha and Luu, Khoa},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
year={2026}
}
```
This work adopts publicly available datasets and open-source implementations. We gratefully acknowledge the contributions of the following projects: [zackschen/CoIN](https://github.com/zackschen/CoIN), [bjzhb666/MLLM-CL-CL](https://github.com/bjzhb666/MLLM-CL), and [haotian-liu/LLaVA](https://github.com/haotian-liu/LLaVA).
提供机构:
cviu-uarkansas



