PaDT-MLLM/COCO
收藏Hugging Face2025-10-10 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/PaDT-MLLM/COCO
下载链接
链接失效反馈官方服务:
资源简介:
PaDT是一种统一的多模态视觉任务范式,它允许大型语言模型直接生成文本和视觉输出。该模型的核心是视觉参考令牌(VRTs),这些令牌允许模型以更自然和直接的方式推理视觉信息。PaDT在各种视觉感知和理解任务中实现了最先进的性能。
PaDT is a unified paradigm for multimodal vision tasks in MLLMs, enabling the generation of both textual and visual outputs directly. At its core are Visual Reference Tokens (VRTs) that allow the model to reason about visual information within the output sequence more naturally. PaDT achieves state-of-the-art performance across various visual perception and understanding tasks.
提供机构:
PaDT-MLLM



