SACo-VEval

Name: SACo-VEval
Creator: maas
Published: 2026-01-06 16:52:16
License: 暂无描述

魔搭社区2026-01-06 更新2025-11-22 收录

下载链接：

https://modelscope.cn/datasets/facebook/SACo-VEval

下载链接

链接失效反馈

官方服务：

资源简介：

# SA-Co/VEval Dataset **License** each domain has its own License * SA-Co/VEval - SA-V: CC-BY-NC 4.0 * SA-Co/VEval - YT-Temporal-1B: CC-BY-NC 4.0 * SA-Co/VEval - SmartGlasses: CC-by-4.0 **SA-Co/VEval** is an evaluation dataset comprising of 3 domains, each domain has a val and test split. * SA-Co/VEval - SA-V: videos are from the [SA-V dataset](https://ai.meta.com/datasets/segment-anything-video/) * SA-Co/VEval - YT-Temporal-1B: videos are from the [YT-Temporal-1B](https://cove.thecvf.com/datasets/704) * SA-Co/VEval - SmartGlasses: egocentric videos from [Smart Glasses](https://huggingface.co/datasets/facebook/SACo-VEval/blob/main/media/saco_sg.tar.gz) This Hugging Face dataset repo contains the following contents: ``` datasets/facebook/SACo-VEval/tree/main/ ├── annotation/ │ ├── saco_veval_sav_test.json │ ├── saco_veval_sav_val.json │ ├── saco_veval_smartglasses_test.json │ ├── saco_veval_smartglasses_val.json │ ├── saco_veval_yt1b_test.json │ ├── saco_veval_yt1b_val.json └── media/ ├── saco_sg.tar.gz └── yt1b_start_end_time.json ``` * annotation * all the GT json files * media * `saco_sg.tar.gz`: the preprocessed JPEGImages for SA-Co/VEval - SmartGlasses * `yt1b_start_end_time.json`: the Youtube video ids and the start and end time used in SA-Co/VEval - YT-Temporal-1B More detail to prepare the complete SA-Co/VEval Dataset can be found in the [SAM 3 Github](https://github.com/facebookresearch/sam3/tree/main/scripts/eval/veval). ## Annotation Format The format is similar to the [YTVIS](https://youtube-vos.org/dataset/vis/) format. In the annotation json, e.g. `saco_veval_sav_test.json` there are 5 fields: * info: * A dict containing the dataset info * E.g. {'version': 'v1', 'date': '2025-09-24', 'description': 'SA-Co/VEval SA-V Test'} * videos * A list of videos that are used in the current annotation json * It contains {id, video_name, file_names, height, width, length} * annotations * A list of **positive** masklets and their related info * It contains {id, segmentations, bboxes, areas, iscrowd, video_id, height, width, category_id, noun_phrase} * video_id should match to the `videos - id` field above * category_id should match to the `categories - id` field below * segmentations is a list of [RLE](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py) * categories * A **globally** used noun phrase id map, which is true across all 3 domains. * It contains {id, name} * name is the noun phrase * video_np_pairs * A list of video-np pairs, including both **positive** and **negative** used in the current annotation json * It contains {id, video_id, category_id, noun_phrase, num_masklets} * video_id should match the `videos - id` above * category_id should match the `categories - id` above * when `num_masklets > 0` it is a positive video-np pair, and the presenting masklets can be found in the annotations field * when `num_masklets = 0` it is a negative video-np pair, meaning no masklet presenting at all ``` data { "info": info "videos": [video] "annotations": [annotation] "categories": [category] "video_np_pairs": [video_np_pair] } video { "id": int "video_name": str # e.g. sav_000000 "file_names": List[str] "height": int "width": width "length": length } annotation { "id": int "segmentations": List[RLE] "bboxes": List[List[int, int, int, int]] "areas": List[int] "iscrowd": int "video_id": str "height": int "width": int "category_id": int "noun_phrase": str } category { "id": int "name": str } video_np_pair { "id": int "video_id": str "category_id": int "noun_phrase": str "num_masklets" int } ``` SAM 3 Github [sam3/examples/saco_veval_vis_example.ipynb](https://github.com/facebookresearch/sam3/blob/main/examples/saco_veval_vis_example.ipynb) shows some examples of the data format and data visualization.

# SA-Co/VEval 数据集 **授权协议**：每个领域均拥有独立授权协议 * SA-Co/VEval - SA-V：采用CC-BY-NC 4.0协议 * SA-Co/VEval - YT-Temporal-1B：采用CC-BY-NC 4.0协议 * SA-Co/VEval - SmartGlasses：采用CC-by-4.0协议 **SA-Co/VEval**是一款评估数据集，涵盖3个领域，每个领域均包含验证集（val）与测试集（test）划分： * SA-Co/VEval - SA-V：视频源自[SA-V数据集](https://ai.meta.com/datasets/segment-anything-video/) * SA-Co/VEval - YT-Temporal-1B：视频源自[YT-Temporal-1B数据集](https://cove.thecvf.com/datasets/704) * SA-Co/VEval - SmartGlasses：第一人称视角视频源自[Smart Glasses数据集](https://huggingface.co/datasets/facebook/SACo-VEval/blob/main/media/saco_sg.tar.gz) 本Hugging Face数据集仓库包含以下内容： datasets/facebook/SACo-VEval/tree/main/ ├── annotation/ │ ├── saco_veval_sav_test.json │ ├── saco_veval_sav_val.json │ ├── saco_veval_smartglasses_test.json │ ├── saco_veval_smartglasses_val.json │ ├── saco_veval_yt1b_test.json │ └── saco_veval_yt1b_val.json └── media/ ├── saco_sg.tar.gz └── yt1b_start_end_time.json * annotation文件夹：存放所有真值（Ground Truth，简称GT）JSON标注文件 * media文件夹： * `saco_sg.tar.gz`：SA-Co/VEval - SmartGlasses领域的预处理JPEG图像文件 * `yt1b_start_end_time.json`：SA-Co/VEval - YT-Temporal-1B领域所用YouTube视频的ID及起止时间信息完整SA-Co/VEval数据集的准备细节可参考[SAM 3 GitHub仓库](https://github.com/facebookresearch/sam3/tree/main/scripts/eval/veval)。 ## 标注格式该标注格式与[YTVIS数据集](https://youtube-vos.org/dataset/vis/)格式类似。以`saco_veval_sav_test.json`为例，标注JSON文件包含5个核心字段： * info： * 包含数据集元信息的字典 * 示例：`{"version": "v1", "date": "2025-09-24", "description": "SA-Co/VEval SA-V Test"}` * videos： * 当前标注文件中所用视频的列表 * 每个元素包含`id`、`video_name`、`file_names`、`height`、`width`、`length`字段 * annotations： * 所有正样本掩码块（masklets）及其关联信息的列表 * 每个元素包含`id`、`segmentations`、`bboxes`、`areas`、`iscrowd`、`video_id`、`height`、`width`、`category_id`、`noun_phrase`字段： * `video_id`需与上述`videos`字段中的`id`保持一致 * `category_id`需与下述`categories`字段中的`id`保持一致 * `segmentations`为行程长度编码（RLE）列表，格式参考[COCO数据集掩码工具](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py) * categories： * 全局统一的名词短语ID映射表，在3个领域中通用 * 每个元素包含`id`、`name`字段： * `name`为对应的名词短语 * video_np_pairs： * 当前标注文件中所用的视频-名词短语对列表，涵盖正样本与负样本 * 每个元素包含`id`、`video_id`、`category_id`、`noun_phrase`、`num_masklets`字段： * `video_id`需与上述`videos`字段中的`id`保持一致 * `category_id`需与上述`categories`字段中的`id`保持一致 * 当`num_masklets > 0`时为正样本视频-名词短语对，其对应的掩码块可在`annotations`字段中检索到 * 当`num_masklets = 0`时为负样本视频-名词短语对，表示该样本中无对应掩码块 data { "info": info "videos": [video] "annotations": [annotation] "categories": [category] "video_np_pairs": [video_np_pair] } video { "id": int "video_name": str # 示例：sav_000000 "file_names": List[str] "height": int "width": int "length": int } annotation { "id": int "segmentations": List[RLE] "bboxes": List[List[int, int, int, int]] "areas": List[int] "iscrowd": int "video_id": str "height": int "width": int "category_id": int "noun_phrase": str } category { "id": int "name": str } video_np_pair { "id": int "video_id": str "category_id": int "noun_phrase": str "num_masklets": int } SAM 3 GitHub仓库中的[sam3/examples/saco_veval_vis_example.ipynb](https://github.com/facebookresearch/sam3/blob/main/examples/saco_veval_vis_example.ipynb)展示了该数据集格式的示例与数据可视化效果。

提供机构：

maas

创建时间：

2025-11-20

搜集汇总

数据集介绍