five

MIC

收藏
魔搭社区2025-12-29 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/BleachNick/MIC
下载链接
链接失效反馈
官方服务:
资源简介:
## Dataset Description Visual Language Models (VLMs) have made significant progress in various downstream tasks by developing large-scale multimodal models. However, they sometimes lack reasoning and contextual learning abilities. On the other hand, Large Language Models (LLMs) have revolutionized the NLP community with their strong reasoning and contextual learning capabilities. LLMs can quickly adapt to new tasks involving inference without fine-tuning pre-trained models or parameter updates, such as question answering and commonsense reasoning. Studying in context learning abilities contributes to VLMs' ability to generalize new knowledge in lifelong learning environments, develop learnable capabilities, and advance artificial intelligence skills. Therefore, we propose the MIC(Multimodality In-Context Learning) dataset. This is a manually constructed instruction tuning dataset supports interleaved text-image inputs, inter-related multiple image inputs, and multimodal in-context learning inputs. By finetuning VLMs on MIC, we enable them to possess multimodal in-context learning capabilities and understand complex relationships between instructions and multiple images. ### Dataset Introduction Using multiple data source such as: VQAv2, GQA, COCO, NLVR2, OKVQK, FILCKR, STVQA, MSRVTT, MSRVTTQA, FunQA, TextVQA, RefCOCO, Vizwiz_caption, Ln_COCO, Textcap, WikiArt, DiffusionDB, VSR, LLaVa-Instruct, MiniImagenet, we tranform them with our designed context schema to form those open source dataset into a unified multimodal in-context format. We stored it into jsonl files: It forms the all data in to multi instruction style with zero to few-shot form data. Image data can be found in the **data** folder stored in *.zip files. You can refer to the [MIC_tool](https://github.com/HaozheZhao/MIC_tool) repo which is the tool we used to transform the open source datasets into MIC data format. You can also save the processed data into arrow files, which are compatible with huggingface datasets. For further details, please refer to the [GitHub repository](https://github.com/HaozheZhao/MIC) or consult the accompanying paper. ``` @misc{zhao2023mmicl, title={MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning}, author={Haozhe Zhao and Zefan Cai and Shuzheng Si and Xiaojian Ma and Kaikai An and Liang Chen and Zixuan Liu and Sheng Wang and Wenjuan Han and Baobao Chang}, year={2023}, eprint={2309.07915}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

## 数据集描述 视觉语言模型(Visual Language Models,VLMs)通过构建大规模多模态模型,在各类下游任务中取得了显著进展。然而,它们往往缺乏推理与上下文学习能力。与之相对,大语言模型(Large Language Model,LLMs)凭借强大的推理与上下文学习能力,彻底革新了自然语言处理(Natural Language Processing,NLP)社区。大语言模型无需对预训练模型进行微调或参数更新,即可快速适配包含推理的新任务,例如问答与常识推理。 研究上下文学习能力,有助于视觉语言模型在终身学习环境中泛化新知识、开发可学习能力,并推动人工智能技术的进阶。为此,我们提出了MIC(多模态上下文学习,Multimodality In-Context Learning)数据集。这是一个人工构建的指令微调数据集,支持交错式图文输入、关联多图像输入以及多模态上下文学习输入。通过在MIC数据集上对视觉语言模型进行微调,我们可使其具备多模态上下文学习能力,并理解指令与多图像间的复杂关联。 ### 数据集简介 我们采用VQAv2、GQA、COCO、NLVR2、OKVQK、FILCKR、STVQA、MSRVTT、MSRVTTQA、FunQA、TextVQA、RefCOCO、Vizwiz_caption、Ln_COCO、Textcap、WikiArt、DiffusionDB、VSR、LLaVa-Instruct、MiniImagenet等多类数据源,通过我们设计的上下文范式对其进行转换,将这些开源数据集统一为多模态上下文学习格式。我们将数据存储为jsonl文件:所有数据均采用多指令风格,包含零样本至少样本形式的数据。图像数据可在**data**文件夹下的*.zip压缩包中获取。 您可参考我们用于将开源数据集转换为MIC格式的工具仓库[MIC_tool](https://github.com/HaozheZhao/MIC_tool)。 您也可将处理后的数据保存为兼容Hugging Face Datasets的Arrow格式文件。 如需了解更多细节,请查阅[GitHub仓库](https://github.com/HaozheZhao/MIC)或配套论文。 @misc{zhao2023mmicl, title={MMICL: 赋能视觉语言模型的多模态上下文学习}, author={Haozhe Zhao and Zefan Cai and Shuzheng Si and Xiaojian Ma and Kaikai An and Liang Chen and Zixuan Liu and Sheng Wang and Wenjuan Han and Baobao Chang}, year={2023}, eprint={2309.07915}, archivePrefix={arXiv}, primaryClass={cs.CL} }
提供机构:
maas
创建时间:
2023-10-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作