Perceive-Anything/PAM-data
收藏Hugging Face2025-06-06 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/Perceive-Anything/PAM-data
下载链接
链接失效反馈官方服务:
资源简介:
Perceive Anything模型(PAM)是一个用于图像和视频中全面区域级视觉理解的简单高效框架。该框架通过整合大型语言模型,实现了对象分割与多样化区域特定语义输出(包括类别、标签定义、功能解释和详细字幕)的同步生成。该模型将SAM 2丰富的视觉特征转换为多模态标记,供大型语言模型理解,并开发了专门的数据精炼和增强流程,从而构建了一个包含图像和视频区域语义注释的高质量数据集,其中包括创新的区域级流视频字幕数据。
The Perceive Anything Model (PAM) is a conceptually simple and efficient framework for comprehensive region-level visual understanding in images and videos. By integrating Large Language Models (LLMs), it enables simultaneous object segmentation along with the generation of diverse, region-specific semantic outputs, including categories, label definitions, functional explanations, and detailed captions. The model transforms SAM 2s rich visual features into multi-modal tokens for LLM comprehension and employs a dedicated data refinement and augmentation pipeline to create a high-quality dataset with image and video region-semantic annotations, including novel region-level streaming video caption data.
提供机构:
Perceive-Anything



