PaDT-MLLM/COCO

Name: PaDT-MLLM/COCO
Creator: PaDT-MLLM
Published: 2025-10-10 04:09:11
License: 暂无描述

Hugging Face2025-10-10 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/PaDT-MLLM/COCO

下载链接

链接失效反馈

官方服务：

资源简介：

PaDT是一种统一的多模态视觉任务范式，它允许大型语言模型直接生成文本和视觉输出。该模型的核心是视觉参考令牌（VRTs），这些令牌允许模型以更自然和直接的方式推理视觉信息。PaDT在各种视觉感知和理解任务中实现了最先进的性能。

PaDT is a unified paradigm for multimodal vision tasks in MLLMs, enabling the generation of both textual and visual outputs directly. At its core are Visual Reference Tokens (VRTs) that allow the model to reason about visual information within the output sequence more naturally. PaDT achieves state-of-the-art performance across various visual perception and understanding tasks.

提供机构：

PaDT-MLLM

5,000+

优质数据集

54 个

任务类型

进入经典数据集