five

Octavius Dataset

收藏
OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/LAMM/OctaviusDataset
下载链接
链接失效反馈
官方服务:
资源简介:
Octavius: Mitigating Task Interference in MLLMs via MoE Zeren Chen1,2*  Ziqin Wang1,3*  Zhen Wang2*  Huayang Liu2 Zhenfei Yin1,4  Si Liu3  Lu Sheng2†  Wanli Ouyang1,4  Yu Qiao1  Jing Shao1† 1Shanghai AI Laboratory 2School of Software, Beihang University 3Institute of Artifical Intelligence, Beihang University 4University of Sydney * Equal Contribution † Corresponding Author ArXiv Code Model Zoo Introduction We propose Octavius, a unified, multimodal large language with a novel capability to comprehend various tasks across different modalities, including but not limited to 2D captioning, 2D detection, 3D VQA, and 3D dense captioning. Through combining well-known Mixture-of-Experts (MoE) and one of the representative PEFT techniques, i.e., LoRA, Octavius can efficiently be involved in more downstream tasks and more modalities by learning more LoRA modules, alleviating the potential task interference issuse arise from multimodal learning. Usage Environment installation. Prepare the instruction / benchmark dataset and required pretrained weights for LLMs and visual encoder. Training scripts: Image modality only cd src sh tools/Octavius/train_octavius_slurm.sh <YOUR_PARTITION> <NUM_GPU> \ config/Octavius/octavius_2d_e4_bs64.yaml octavius_2d_e4_bs64 Point cloud modality only cd src sh tools/Octavius/train_octavius_slurm.sh <YOUR_PARTITION> <NUM_GPU> \ config/Octavius/octavius_3d_e3_bs64.yaml octavius_3d_e3_bs64 Image & point cloud modality joint cd src sh tools/Octavius/train_octavius_slurm.sh <YOUR_PARTITION> <NUM_GPU> \ config/Octavius/octavius_2d+3d_e6_bs64.yaml octavius_2d +3d_e6_bs64 We provide pretrained Octavius model here. Evaluation We use ChEF to evaluate Octavius on both image and point cloud modalities, see here for details. Citation @misc{chen2023octavius, title={Octavius: Mitigating Task Interference in MLLMs via MoE}, author={Zeren Chen and Ziqin Wang and Zhen Wang and Huayang Liu and Zhenfei Yin and Si Liu and Lu Sheng and Wanli Ouyang and Yu Qiao and Jing Shao}, year={2023}, eprint={2311.02684}, archivePrefix={arXiv}, primaryClass={cs.CV} } License The project is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.
提供机构:
LAMM
创建时间:
2023-10-27
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Octavius Dataset是一个用于多模态大语言模型微调的开源数据集,基于CC BY-NC 4.0许可证,旨在通过结合MoE和LoRA技术处理2D和3D等多种模态任务,以减轻任务干扰。该数据集由上海人工智能实验室于2023年11月发布。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作