用户数据集仓库
收藏魔搭社区2025-12-04 更新2025-04-19 收录
下载链接:
https://modelscope.cn/datasets/zsyf261405/UserDatasetLibrary
下载链接
链接失效反馈官方服务:
资源简介:
# [[ECCV2024](https://arxiv.org/abs/2404.12467)] Towards Multi-modal Transformers in Federated Learning
Official repository for Towards Multi-modal Transformers in Federated Learning (ECCV2024). Code will be released soon.
# Citation
```
@inproceedings{sun2024towards,
title={Towards Multi-modal Transformers in Federated Learning},
author={Sun, Guangyu and Mendieta, Matias and Dutta, Aritra and Li, Xin and Chen, Chen},
booktitle={European Conference on Computer Vision},
pages={229--246},
year={2024},
organization={Springer}
}
@article{sun2024towards,
title={Towards Multi-modal Transformers in Federated Learning},
author={Sun, Guangyu and Mendieta, Matias and Dutta, Aritra and Li, Xin and Chen, Chen},
journal={arXiv preprint arXiv:2404.12467},
year={2024}
}
```
# Get Started
## Environment
Python version: 3.8.0
```
pip install -r requirements.txt
```
## Prepare Data
Option 1: Directly download the entire `data` folder from [google drive](https://drive.google.com/file/d/1MhOE4q2P_D3Y5muyz-fhN6GnVSTgbK16/view?usp=sharing)
Option 2:
Download Flickr30k dataset and put all images into `data/flickr30k/flickr30k_images`.
Download MS-COCO 2014 and put all images and annotations into `data/coco/all_images` and `data/coco/annotations`
## Wandb for Logging
Set up `wandb.init()` with your own project name and entity.
# Experiments
Please use scripts under `scirpts` to run experiments with the methods and settings to reproduce the results in our paper.
# Model Explaination
We unify the img and text encoders into one model `ModalityAgnosticTransformer` for easier aggregation:
`shared_param`: Shared parameters between same modality in different type of client (i.e., img encoder in img client and img encoder in img-txt client)
`share_scope`: Shared scope during aggregation
dataset: share parameters only to encoders with the same dataset
modality: share parameters only to encoders with the same modality
all: share parameters among all encoders
`colearn_param`: Shared parameters between img and txt encoders
# Method Configurations
To correctly configurate each method, please follow this table:
| Name | shared_param | share_scope | algorithm | Others|
|----------|--------------|------------------|-----------|--------|
| FedAVG | none | dataset | fedavg |
| FedIoT | blocks | modality_exact | fediot |
| FedProx | none | dataset | fedprox |
| CreamFL | none | dataset | creamfl |
| **FedCola (ours)** | attn | modality | fedavg | --aux --aux_trained
# Acknowledgement
This codebase is based on [Federated Learning in PyTorch](https://github.com/vaseline555/Federated-Learning-in-PyTorch). We extend it to our multi-modal federated learning setting.
For local complementary training, we adapted code from [here](https://github.com/AILab-CVC/M2PT) to add aux weights from the other modality.
# [[ECCV2024](https://arxiv.org/abs/2404.12467)] 面向联邦学习(Federated Learning)的多模态Transformer(Multi-modal Transformers)
本仓库为论文《面向联邦学习的多模态Transformer》(ECCV 2024)的官方代码仓库,完整代码即将开源。
# 引用格式
@inproceedings{sun2024towards,
title={Towards Multi-modal Transformers in Federated Learning},
author={Sun, Guangyu and Mendieta, Matias and Dutta, Aritra and Li, Xin and Chen, Chen},
booktitle={European Conference on Computer Vision},
pages={229--246},
year={2024},
organization={Springer}
}
@article{sun2024towards,
title={Towards Multi-modal Transformers in Federated Learning},
author={Sun, Guangyu and Mendieta, Matias and Dutta, Aritra and Li, Xin and Chen, Chen},
journal={arXiv preprint arXiv:2404.12467},
year={2024}
}
# 快速开始
## 运行环境
Python版本:3.8.0
pip install -r requirements.txt
## 数据准备
### 方案1:直接下载完整数据集文件夹
从[谷歌云端硬盘](https://drive.google.com/file/d/1MhOE4q2P_D3Y5muyz-fhN6GnVSTgbK16/view?usp=sharing)下载完整的`data`文件夹。
### 方案2:手动下载数据集
1. 下载Flickr30k数据集,将所有图片放入`data/flickr30k/flickr30k_images`路径下。
2. 下载MS-COCO 2014数据集,将所有图片与标注分别放入`data/coco/all_images`与`data/coco/annotations`路径下。
## Weights & Biases(Wandb)日志记录
使用您自定义的项目名称与实体账号配置`wandb.init()`。
# 实验复现
请使用`scripts`目录下的脚本运行对应方法与设置的实验,以复现论文中的实验结果。
# 模型说明
为便于参数聚合,我们将图像编码器与文本编码器统一为`ModalityAgnosticTransformer`单模型:
- `shared_param`:不同类型客户端中同模态编码器的共享参数(例如,图像客户端的图像编码器与图文混合客户端的图像编码器之间的共享参数)
- `share_scope`:聚合阶段的参数共享范围:
- `dataset`:仅将参数共享给使用同一数据集的编码器
- `modality`:仅将参数共享给同模态的编码器
- `all`:将参数共享给所有编码器
- `colearn_param`:图像编码器与文本编码器之间的共享参数
# 方法配置
如需正确配置各方法,请参照下表:
| 方法名称 | shared_param | share_scope | 算法类型 | 其他配置 |
|----------|--------------|------------------|-----------|----------|
| FedAVG | 无 | dataset | fedavg | 无 |
| FedIoT | blocks | modality_exact | fediot | 无 |
| FedProx | 无 | dataset | fedprox | 无 |
| CreamFL | 无 | dataset | creamfl | 无 |
| **FedCola(本文提出方法)** | attn | modality | fedavg | --aux --aux_trained |
# 致谢
本代码库基于[Federated Learning in PyTorch](https://github.com/vaseline555/Federated-Learning-in-PyTorch)开发,我们将其拓展至多模态联邦学习场景。
针对本地互补训练任务,我们适配了[此处](https://github.com/AILab-CVC/M2PT)的代码,以引入另一模态的辅助权重。
提供机构:
maas
创建时间:
2025-04-15
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是论文《Towards Multi-modal Transformers in Federated Learning》(ECCV2024)的官方代码仓库,专注于多模态Transformer在联邦学习中的研究。它采用Apache 2.0许可证,数据规模为157.25GB,并于2026年5月更新。
以上内容由遇见数据集搜集并总结生成



