SACo-VEval
收藏魔搭社区2026-01-06 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/facebook/SACo-VEval
下载链接
链接失效反馈官方服务:
资源简介:
# SA-Co/VEval Dataset
**License** each domain has its own License
* SA-Co/VEval - SA-V: CC-BY-NC 4.0
* SA-Co/VEval - YT-Temporal-1B: CC-BY-NC 4.0
* SA-Co/VEval - SmartGlasses: CC-by-4.0
**SA-Co/VEval** is an evaluation dataset comprising of 3 domains, each domain has a val and test split.
* SA-Co/VEval - SA-V: videos are from the [SA-V dataset](https://ai.meta.com/datasets/segment-anything-video/)
* SA-Co/VEval - YT-Temporal-1B: videos are from the [YT-Temporal-1B](https://cove.thecvf.com/datasets/704)
* SA-Co/VEval - SmartGlasses: egocentric videos from [Smart Glasses](https://huggingface.co/datasets/facebook/SACo-VEval/blob/main/media/saco_sg.tar.gz)
This Hugging Face dataset repo contains the following contents:
```
datasets/facebook/SACo-VEval/tree/main/
├── annotation/
│ ├── saco_veval_sav_test.json
│ ├── saco_veval_sav_val.json
│ ├── saco_veval_smartglasses_test.json
│ ├── saco_veval_smartglasses_val.json
│ ├── saco_veval_yt1b_test.json
│ ├── saco_veval_yt1b_val.json
└── media/
├── saco_sg.tar.gz
└── yt1b_start_end_time.json
```
* annotation
* all the GT json files
* media
* `saco_sg.tar.gz`: the preprocessed JPEGImages for SA-Co/VEval - SmartGlasses
* `yt1b_start_end_time.json`: the Youtube video ids and the start and end time used in SA-Co/VEval - YT-Temporal-1B
More detail to prepare the complete SA-Co/VEval Dataset can be found in the [SAM 3 Github](https://github.com/facebookresearch/sam3/tree/main/scripts/eval/veval).
## Annotation Format
The format is similar to the [YTVIS](https://youtube-vos.org/dataset/vis/) format.
In the annotation json, e.g. `saco_veval_sav_test.json` there are 5 fields:
* info:
* A dict containing the dataset info
* E.g. {'version': 'v1', 'date': '2025-09-24', 'description': 'SA-Co/VEval SA-V Test'}
* videos
* A list of videos that are used in the current annotation json
* It contains {id, video_name, file_names, height, width, length}
* annotations
* A list of **positive** masklets and their related info
* It contains {id, segmentations, bboxes, areas, iscrowd, video_id, height, width, category_id, noun_phrase}
* video_id should match to the `videos - id` field above
* category_id should match to the `categories - id` field below
* segmentations is a list of [RLE](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py)
* categories
* A **globally** used noun phrase id map, which is true across all 3 domains.
* It contains {id, name}
* name is the noun phrase
* video_np_pairs
* A list of video-np pairs, including both **positive** and **negative** used in the current annotation json
* It contains {id, video_id, category_id, noun_phrase, num_masklets}
* video_id should match the `videos - id` above
* category_id should match the `categories - id` above
* when `num_masklets > 0` it is a positive video-np pair, and the presenting masklets can be found in the annotations field
* when `num_masklets = 0` it is a negative video-np pair, meaning no masklet presenting at all
```
data {
"info": info
"videos": [video]
"annotations": [annotation]
"categories": [category]
"video_np_pairs": [video_np_pair]
}
video {
"id": int
"video_name": str # e.g. sav_000000
"file_names": List[str]
"height": int
"width": width
"length": length
}
annotation {
"id": int
"segmentations": List[RLE]
"bboxes": List[List[int, int, int, int]]
"areas": List[int]
"iscrowd": int
"video_id": str
"height": int
"width": int
"category_id": int
"noun_phrase": str
}
category {
"id": int
"name": str
}
video_np_pair {
"id": int
"video_id": str
"category_id": int
"noun_phrase": str
"num_masklets" int
}
```
SAM 3 Github [sam3/examples/saco_veval_vis_example.ipynb](https://github.com/facebookresearch/sam3/blob/main/examples/saco_veval_vis_example.ipynb) shows some examples of the data format and data visualization.
# SA-Co/VEval 数据集
**授权协议**:每个领域均拥有独立授权协议
* SA-Co/VEval - SA-V:采用CC-BY-NC 4.0协议
* SA-Co/VEval - YT-Temporal-1B:采用CC-BY-NC 4.0协议
* SA-Co/VEval - SmartGlasses:采用CC-by-4.0协议
**SA-Co/VEval**是一款评估数据集,涵盖3个领域,每个领域均包含验证集(val)与测试集(test)划分:
* SA-Co/VEval - SA-V:视频源自[SA-V数据集](https://ai.meta.com/datasets/segment-anything-video/)
* SA-Co/VEval - YT-Temporal-1B:视频源自[YT-Temporal-1B数据集](https://cove.thecvf.com/datasets/704)
* SA-Co/VEval - SmartGlasses:第一人称视角视频源自[Smart Glasses数据集](https://huggingface.co/datasets/facebook/SACo-VEval/blob/main/media/saco_sg.tar.gz)
本Hugging Face数据集仓库包含以下内容:
datasets/facebook/SACo-VEval/tree/main/
├── annotation/
│ ├── saco_veval_sav_test.json
│ ├── saco_veval_sav_val.json
│ ├── saco_veval_smartglasses_test.json
│ ├── saco_veval_smartglasses_val.json
│ ├── saco_veval_yt1b_test.json
│ └── saco_veval_yt1b_val.json
└── media/
├── saco_sg.tar.gz
└── yt1b_start_end_time.json
* annotation文件夹:存放所有真值(Ground Truth,简称GT)JSON标注文件
* media文件夹:
* `saco_sg.tar.gz`:SA-Co/VEval - SmartGlasses领域的预处理JPEG图像文件
* `yt1b_start_end_time.json`:SA-Co/VEval - YT-Temporal-1B领域所用YouTube视频的ID及起止时间信息
完整SA-Co/VEval数据集的准备细节可参考[SAM 3 GitHub仓库](https://github.com/facebookresearch/sam3/tree/main/scripts/eval/veval)。
## 标注格式
该标注格式与[YTVIS数据集](https://youtube-vos.org/dataset/vis/)格式类似。
以`saco_veval_sav_test.json`为例,标注JSON文件包含5个核心字段:
* info:
* 包含数据集元信息的字典
* 示例:`{"version": "v1", "date": "2025-09-24", "description": "SA-Co/VEval SA-V Test"}`
* videos:
* 当前标注文件中所用视频的列表
* 每个元素包含`id`、`video_name`、`file_names`、`height`、`width`、`length`字段
* annotations:
* 所有正样本掩码块(masklets)及其关联信息的列表
* 每个元素包含`id`、`segmentations`、`bboxes`、`areas`、`iscrowd`、`video_id`、`height`、`width`、`category_id`、`noun_phrase`字段:
* `video_id`需与上述`videos`字段中的`id`保持一致
* `category_id`需与下述`categories`字段中的`id`保持一致
* `segmentations`为行程长度编码(RLE)列表,格式参考[COCO数据集掩码工具](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py)
* categories:
* 全局统一的名词短语ID映射表,在3个领域中通用
* 每个元素包含`id`、`name`字段:
* `name`为对应的名词短语
* video_np_pairs:
* 当前标注文件中所用的视频-名词短语对列表,涵盖正样本与负样本
* 每个元素包含`id`、`video_id`、`category_id`、`noun_phrase`、`num_masklets`字段:
* `video_id`需与上述`videos`字段中的`id`保持一致
* `category_id`需与上述`categories`字段中的`id`保持一致
* 当`num_masklets > 0`时为正样本视频-名词短语对,其对应的掩码块可在`annotations`字段中检索到
* 当`num_masklets = 0`时为负样本视频-名词短语对,表示该样本中无对应掩码块
data {
"info": info
"videos": [video]
"annotations": [annotation]
"categories": [category]
"video_np_pairs": [video_np_pair]
}
video {
"id": int
"video_name": str # 示例:sav_000000
"file_names": List[str]
"height": int
"width": int
"length": int
}
annotation {
"id": int
"segmentations": List[RLE]
"bboxes": List[List[int, int, int, int]]
"areas": List[int]
"iscrowd": int
"video_id": str
"height": int
"width": int
"category_id": int
"noun_phrase": str
}
category {
"id": int
"name": str
}
video_np_pair {
"id": int
"video_id": str
"category_id": int
"noun_phrase": str
"num_masklets": int
}
SAM 3 GitHub仓库中的[sam3/examples/saco_veval_vis_example.ipynb](https://github.com/facebookresearch/sam3/blob/main/examples/saco_veval_vis_example.ipynb)展示了该数据集格式的示例与数据可视化效果。
提供机构:
maas
创建时间:
2025-11-20
搜集汇总
数据集介绍

背景与挑战
背景概述
SACo-VEval是一个多领域评估数据集,包含SA-V、YT-Temporal-1B和SmartGlasses三个领域的视频数据,每个领域均有验证和测试分割。数据集提供了详细的注释信息,包括视频元数据、物体分割掩码、边界框和类别标签,适用于视频理解和物体分割任务的评估。
以上内容由遇见数据集搜集并总结生成



