Perceive-Anything/PAM-data

Name: Perceive-Anything/PAM-data
Creator: Perceive-Anything
Published: 2025-06-06 07:45:27
License: 暂无描述

Hugging Face2025-06-06 更新2025-11-01 收录

下载链接：

https://hf-mirror.com/datasets/Perceive-Anything/PAM-data

下载链接

链接失效反馈

官方服务：

资源简介：

Perceive Anything模型（PAM）是一个用于图像和视频中全面区域级视觉理解的简单高效框架。该框架通过整合大型语言模型，实现了对象分割与多样化区域特定语义输出（包括类别、标签定义、功能解释和详细字幕）的同步生成。该模型将SAM 2丰富的视觉特征转换为多模态标记，供大型语言模型理解，并开发了专门的数据精炼和增强流程，从而构建了一个包含图像和视频区域语义注释的高质量数据集，其中包括创新的区域级流视频字幕数据。

The Perceive Anything Model (PAM) is a conceptually simple and efficient framework for comprehensive region-level visual understanding in images and videos. By integrating Large Language Models (LLMs), it enables simultaneous object segmentation along with the generation of diverse, region-specific semantic outputs, including categories, label definitions, functional explanations, and detailed captions. The model transforms SAM 2s rich visual features into multi-modal tokens for LLM comprehension and employs a dedicated data refinement and augmentation pipeline to create a high-quality dataset with image and video region-semantic annotations, including novel region-level streaming video caption data.

提供机构：

Perceive-Anything

5,000+

优质数据集

54 个

任务类型

进入经典数据集