five

lllchenlll/COCO_ARC

收藏
Hugging Face2023-11-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lllchenlll/COCO_ARC
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是关于视觉-语言指令调优(VLIT)的综述和分析,涵盖了现有的VLIT数据集及其生成方法。数据集分为Annotation Adaption和Self-Instruct两大类,Annotation Adaption主要通过调整和重写现有注释数据来适应VLIT数据模板,而Self-Instruct则依赖大型语言模型(LLM)从更多来源合成注释数据,生成更具多样性和复杂性的VLIT数据。数据集进一步细分为通用指令和特定指令,特定指令包括对象/任务特定和领域特定。
提供机构:
lllchenlll
原始信息汇总

视觉-语言指令调优数据集概述

数据集概述

本文档提供了关于视觉-语言指令调优(VLIT)的相关数据集的详细信息。这些数据集主要用于训练和评估多模态语言模型,涵盖了从通用指令到特定领域指令的多种类型。

数据集分类

现有VLIT数据集

现有的VLIT生成方案可以分为两大类:

  • Annotation Adaption:主要依赖于直接调整和重写现有标注数据以适应VLIT数据模板。
  • Self-Instruct:依赖于大型语言模型(LLM)从更多来源合成标注数据,并重新组织以生成具有更多多样性和复杂性的VLIT数据(当然,这也带来了更多的噪声和幻觉)。

数据集结构

plaintext VLIT Data ├─ General Instruction │ ├─ Annotation Adaption │ └─ Self-Instruct ├─ Specific Instruction │ ├─ Object/Task-Specific │ │ ├─ Region │ │ ├─ Video │ │ └─ Text │ └─ Domain-Specific │ ├─ Medicine │ ├─ Document │ └─ PointCloud ├─ Construction Tools └─ Data Mixing

具体数据集列表

数据集 MLLM 论文
LVIS-INSTRUCT4V - To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
GranD GLaMM GLaMM: Pixel Grounding Large Multimodal Model
ComVint - What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
MiniGPT-v2 MiniGPT-v2 MiniGPT-v2: Large Language Model As a Unified Interface for Vision-Language Multi-task Learning
GRIT Ferret FERRET REFER AND GROUND ANYTHING ANYWHERE AT ANY GRANULARITY
SparklesDialogue-VG SparklesChat Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
SparklesDialogue-CC SparklesChat Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
InternLM-XComposer InternLM-XComposer InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
AnyMAL AnyMAL AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
DreamLLM DreamLLM DREAMLLM: SYNERGISTIC MULTIMODAL COMPREHENSION AND CREATION
TextBind TextBind TEXTBIND: Multi-turn Interleaved Multimodal Instruction-following in the Wild
PVIT PVIT Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
T2M NExT-GPT NExT-GPT: Any-to-Any Multimodal LLM
MosIT NExT-GPT NExT-GPT: Any-to-Any Multimodal LLM
GPTVQA MLLM-DataEngine MLLM-DataEngine: An Iterative Refinement Approach for MLLM
CIEM - CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
PointLLM PointLLM PointLLM: Empowering Large Language Models to Understand Point Clouds
VIGC VIGC VIGC: Visual Instruction Generation and Correction
M-HalDetec - Detecting and Preventing Hallucinations in Large Vision Language Models
StableLLaVA StableLLaVA StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
I4 Cheetor EMPOWERING VISION-LANGUAGE MODELS TO FOLLOW INTERLEAVED VISION-LANGUAGE INSTRUCTIONS
AS-1B ASM The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Multimodal_id_v1 LMEye(IPN) LMEye: An Interactive Perception Network for Large Language Models
Lynx Lynx What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
MGVLID ChatSpot ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
BuboGPT BuboGPT BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
GRIT-20M KOSMOS-2 KOSMOS-2: Grounding Multimodal Large Language Models to the World
SVIT SVIT(MMLLM) SVIT: Scaling up Visual Instruction Tuning
GPT4RoI GPT4RoI GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
PF-1M Clever Flamingo Visual Instruction Tuning with Polite Flamingo
Shikra-RD Shikra Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic
LLaVAR LLaVAR LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
OphGLM OphGLM OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue
LAMM LAMM [LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark](https://github.com/palchenli/VL-Instruction-Tuning/blob/main/assert/paper/
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作