lllchenlll/COCO_ARC

Name: lllchenlll/COCO_ARC
Creator: lllchenlll
Published: 2023-11-17 02:10:10
License: 暂无描述

Hugging Face2023-11-17 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/lllchenlll/COCO_ARC

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是关于视觉-语言指令调优（VLIT）的综述和分析，涵盖了现有的VLIT数据集及其生成方法。数据集分为Annotation Adaption和Self-Instruct两大类，Annotation Adaption主要通过调整和重写现有注释数据来适应VLIT数据模板，而Self-Instruct则依赖大型语言模型（LLM）从更多来源合成注释数据，生成更具多样性和复杂性的VLIT数据。数据集进一步细分为通用指令和特定指令，特定指令包括对象/任务特定和领域特定。

提供机构：

lllchenlll

原始信息汇总

视觉-语言指令调优数据集概述

数据集概述

本文档提供了关于视觉-语言指令调优（VLIT）的相关数据集的详细信息。这些数据集主要用于训练和评估多模态语言模型，涵盖了从通用指令到特定领域指令的多种类型。

数据集分类

现有VLIT数据集

现有的VLIT生成方案可以分为两大类：

Annotation Adaption：主要依赖于直接调整和重写现有标注数据以适应VLIT数据模板。
Self-Instruct：依赖于大型语言模型（LLM）从更多来源合成标注数据，并重新组织以生成具有更多多样性和复杂性的VLIT数据（当然，这也带来了更多的噪声和幻觉）。

数据集结构

plaintext VLIT Data ├─ General Instruction │ ├─ Annotation Adaption │ └─ Self-Instruct ├─ Specific Instruction │ ├─ Object/Task-Specific │ │ ├─ Region │ │ ├─ Video │ │ └─ Text │ └─ Domain-Specific │ ├─ Medicine │ ├─ Document │ └─ PointCloud ├─ Construction Tools └─ Data Mixing

具体数据集列表

数据集	MLLM	论文
LVIS-INSTRUCT4V	-	To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
GranD	GLaMM	GLaMM: Pixel Grounding Large Multimodal Model
ComVint	-	What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
MiniGPT-v2	MiniGPT-v2	MiniGPT-v2: Large Language Model As a Unified Interface for Vision-Language Multi-task Learning
GRIT	Ferret	FERRET REFER AND GROUND ANYTHING ANYWHERE AT ANY GRANULARITY
SparklesDialogue-VG	SparklesChat	Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
SparklesDialogue-CC	SparklesChat	Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
InternLM-XComposer	InternLM-XComposer	InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
AnyMAL	AnyMAL	AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
DreamLLM	DreamLLM	DREAMLLM: SYNERGISTIC MULTIMODAL COMPREHENSION AND CREATION
TextBind	TextBind	TEXTBIND: Multi-turn Interleaved Multimodal Instruction-following in the Wild
PVIT	PVIT	Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
T2M	NExT-GPT	NExT-GPT: Any-to-Any Multimodal LLM
MosIT	NExT-GPT	NExT-GPT: Any-to-Any Multimodal LLM
GPTVQA	MLLM-DataEngine	MLLM-DataEngine: An Iterative Refinement Approach for MLLM
CIEM	-	CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
PointLLM	PointLLM	PointLLM: Empowering Large Language Models to Understand Point Clouds
VIGC	VIGC	VIGC: Visual Instruction Generation and Correction
M-HalDetec	-	Detecting and Preventing Hallucinations in Large Vision Language Models
StableLLaVA	StableLLaVA	StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
I4	Cheetor	EMPOWERING VISION-LANGUAGE MODELS TO FOLLOW INTERLEAVED VISION-LANGUAGE INSTRUCTIONS
AS-1B	ASM	The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Multimodal_id_v1	LMEye(IPN)	LMEye: An Interactive Perception Network for Large Language Models
Lynx	Lynx	What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
MGVLID	ChatSpot	ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
BuboGPT	BuboGPT	BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
GRIT-20M	KOSMOS-2	KOSMOS-2: Grounding Multimodal Large Language Models to the World
SVIT	SVIT(MMLLM)	SVIT: Scaling up Visual Instruction Tuning
GPT4RoI	GPT4RoI	GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
PF-1M	Clever Flamingo	Visual Instruction Tuning with Polite Flamingo
Shikra-RD	Shikra	Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic
LLaVAR	LLaVAR	LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
OphGLM	OphGLM	OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue
LAMM	LAMM	[LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark](https://github.com/palchenli/VL-Instruction-Tuning/blob/main/assert/paper/

5,000+

优质数据集

54 个

任务类型

进入经典数据集