SACo-Gold

Name: SACo-Gold
Creator: maas
Published: 2026-05-13 14:37:48
License: 暂无描述

魔搭社区2026-05-13 更新2025-11-22 收录

下载链接：

https://modelscope.cn/datasets/facebook/SACo-Gold

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for SA-Co/Gold SA-Co/Gold is a benchmark for promptable concept segmentation (PCS) in images. The benchmark contains images paired with text labels (also referred as Noun Phrases aka NPs), each annotated exhaustively with masks on all object instances that match the label. SA-Co/Gold comprises 7 subsets, each targeting a different annotation domain. For each subset, the annotations are multi-reviewed and agreed by 3 human annotators resulting in a high-quality benchmark. This dataset covers 2 image sources and 7 annotation domains. The image sources are: MetaCLIP and SA-1B. The annotation domains are: MetaCLIP captioner NPs, SA-1B captioner NPs, Attributes, Crowded Scenes, Wiki-Common1K, Wiki-Food/Drink, Wiki-Sports Equipment. More details on the usage SA-Co/Gold dataset including visualization and evaluation can be found in the [SAM 3 GitHub](https://github.com/facebookresearch/sam3/blob/main/scripts/eval/gold/). ## Annotation Format The annotation format is derived from [COCO format](https://cocodataset.org/#format-data). Notable data fields are: - `images`: a `list` of `dict` features, contains a list of all image-NP pairs. Each entry is related to an image-NP pair and has the following items. - `id`: a `string` feature, unique identifier for the image-NP pair - `text_input`: a `string` feature, the noun phrase for the image-NP pair - `file_name`: a `string` feature, the relative image path in the corresponding data folder. - `annotations`: a `list` of `dict` features, containing a list of all annotations including bounding box, segmentation mask, area etc. - `image_id`: a `string` feature, maps to the identifier for the image-np pair in images - `bbox`: a `list` of float features, containing bounding box in [x,y,w,h] format - `segmentation`: a dict feature, containing segmentation mask in RLE format - `categories`: a `list` of `dict` features, containing a list of all categories. Here, we provide the category key for compatibility with the COCO format, but in open-vocabulary detection we do not use it. Instead, the text prompt is stored directly in each image (text_input in images). Note that in our setting, a unique image (id in images) actually corresponds to an (image, text prompt) combination. For `id` in images that have corresponding annotations (i.e. exist as `image_id` in `annotations`), we refer to them as a "positive" NP. And, for `id` in `images` that don't have any annotations (i.e. they do not exist as `image_id` in `annotations`), we refer to them as a "negative" NP. A sample annotation from Wiki-Food/Drink domain looks as follows: #### images ``` [ { "id": 10000000, "file_name": "1/1001/metaclip_1_1001_c122868928880ae52b33fae1.jpeg", "text_input": "chili", "width": 600, "height": 600, "queried_category": "0", "is_instance_exhaustive": 1, "is_pixel_exhaustive": 1 }, { "id": 10000001, "file_name": "1/1001/metaclip_1_1001_c122868928880ae52b33fae1.jpeg", "text_input": "the fish ball", "width": 600, "height": 600, "queried_category": "2001", "is_instance_exhaustive": 1, "is_pixel_exhaustive": 1 } ] ``` #### annotations ``` [ { "id": 1, "image_id": 10000000, "source": "manual", "area": 0.002477777777777778, "bbox": [ 0.44333332777023315, 0.0, 0.10833333432674408, 0.05833333358168602 ], "segmentation": { "counts": "`kk42fb01O1O1O1O001O1O1O001O1O00001O1O001O001O0000000000O1001000O010O02O001N10001N0100000O10O1000O10O010O100O1O1O1O1O0000001O0O2O1N2N2Nobm4", "size": [ 600, 600 ] }, "category_id": 1, "iscrowd": 0 }, { "id": 2, "image_id": 10000000, "source": "manual", "area": 0.001275, "bbox": [ 0.5116666555404663, 0.5716666579246521, 0.061666667461395264, 0.036666665226221085 ], "segmentation": { "counts": "aWd51db05M1O2N100O1O1O1O1O1O010O100O10O10O010O010O01O100O100O1O00100O1O100O1O2MZee4", "size": [ 600, 600 ] }, "category_id": 1, "iscrowd": 0 } ] ``` ### Data Stats Here are the stats for the 7 annotation domains. The # Image-NPs represent the total number of unique image-NP pairs including both “positive” and “negative” NPs. | Domain | Media | # Image-NPs | # Image-NP-Masks| |--------------------------|--------------|---------------| ----------------| | MetaCLIP captioner NPs | MetaCLIP | 33393 | 20144 | | SA-1B captioner NPs | SA-1B | 13258 | 30306 | | Attributes | MetaCLIP | 9245 | 3663 | | Crowded Scenes | MetaCLIP | 20687 | 50417 | | Wiki-Common1K | MetaCLIP | 65502 | 6448 | | Wiki-Food&Drink | MetaCLIP | 13951 | 9825 | | Wiki-Sports Equipment | MetaCLIP | 12166 | 5075 |

# SA-Co/Gold 数据集卡片 SA-Co/Gold 是一款面向图像可提示概念分割（Promptable Concept Segmentation, PCS）的基准数据集。本基准包含与文本标签（亦称为名词短语，简称NP）配对的图像，所有与标签匹配的物体实例均已通过掩码进行全标注。SA-Co/Gold 共包含7个子集，每个子集针对不同的标注领域。每个子集的标注均经过3名人类标注员的多轮审核并达成一致，确保了数据集的高质量。本数据集涵盖2类图像来源与7类标注领域。图像来源为：MetaCLIP 与 SA-1B。标注领域分别为：MetaCLIP 标注器名词短语、SA-1B 标注器名词短语、属性、拥挤场景、维基通用1K、维基食品与饮品、维基运动器材。有关SA-Co/Gold数据集的更多使用细节（包括可视化与评估方法），可查阅[SAM 3 GitHub仓库](https://github.com/facebookresearch/sam3/blob/main/scripts/eval/gold/)。 ## 标注格式本数据集的标注格式源自[COCO格式](https://cocodataset.org/#format-data)，核心数据字段如下： - `images`：由字典构成的列表，包含所有图像-名词短语（image-NP）对的集合。每个列表项对应一组图像-名词短语对，包含以下字段： - `id`：字符串类型字段，为该图像-名词短语对的唯一标识符 - `text_input`：字符串类型字段，即该图像-名词短语对对应的名词短语 - `file_name`：字符串类型字段，为对应数据文件夹中图像的相对路径 - `annotations`：由字典构成的列表，包含所有标注信息，例如边界框、分割掩码、区域面积等： - `image_id`：字符串类型字段，与`images`字段中图像-名词短语对的标识符对应 - `bbox`：由浮点数构成的列表，边界框格式为[x,y,w,h] - `segmentation`：字典类型字段，包含RLE格式的分割掩码 - `categories`：由字典构成的列表，包含所有类别信息。为兼容COCO格式，本数据集保留了类别字段，但在开放词汇检测任务中并未实际使用该字段。实际的文本提示词直接存储在每个图像条目（即`images`中的`text_input`字段）中。需注意，在本数据集的设定中，唯一的图像标识符（`images`中的`id`）实际上对应一组（图像，文本提示词）组合。若`images`中的某一`id`存在对应的标注（即该`id`作为`image_id`出现在`annotations`字段中），则称其对应的名词短语为“正样本”名词短语；反之，若`images`中的某一`id`无任何标注（即该`id`未在`annotations`的`image_id`中出现），则称其对应的名词短语为“负样本”名词短语。以下为维基食品与饮品领域的一条标注示例： #### images [ { "id": 10000000, "file_name": "1/1001/metaclip_1_1001_c122868928880ae52b33fae1.jpeg", "text_input": "chili", "width": 600, "height": 600, "queried_category": "0", "is_instance_exhaustive": 1, "is_pixel_exhaustive": 1 }, { "id": 10000001, "file_name": "1/1001/metaclip_1_1001_c122868928880ae52b33fae1.jpeg", "text_input": "the fish ball", "width": 600, "height": 600, "queried_category": "2001", "is_instance_exhaustive": 1, "is_pixel_exhaustive": 1 } ] #### annotations [ { "id": 1, "image_id": 10000000, "source": "manual", "area": 0.002477777777777778, "bbox": [ 0.44333332777023315, 0.0, 0.10833333432674408, 0.05833333358168602 ], "segmentation": { "counts": "`kk42fb01O1O1O1O001O1O1O001O1O00001O1O001O001O0000000000O1001000O010O02O001N10001N0100000O10O1000O10O010O100O1O1O1O1O0000001O0O2O1N2N2Nobm4", "size": [ 600, 600 ] }, "category_id": 1, "iscrowd": 0 }, { "id": 2, "image_id": 10000000, "source": "manual", "area": 0.001275, "bbox": [ 0.5116666555404663, 0.5716666579246521, 0.061666667461395264, 0.036666665226221085 ], "segmentation": { "counts": "aWd51db05M1O2N100O1O1O1O1O1O010O100O10O10O010O010O01O100O100O1O00100O1O100O1O2MZee4", "size": [ 600, 600 ] }, "category_id": 1, "iscrowd": 0 } ] ## 数据统计以下为7类标注领域的统计数据。其中# Image-NPs指包含正样本与负样本在内的所有唯一图像-名词短语对的总数。 | 标注领域 | 图像来源 | # 图像-名词短语对 | # 图像-名词短语掩码 | |--------------------------|------------|------------------|--------------------| | MetaCLIP 标注器名词短语 | MetaCLIP | 33393 | 20144 | | SA-1B 标注器名词短语 | SA-1B | 13258 | 30306 | | 属性 | MetaCLIP | 9245 | 3663 | | 拥挤场景 | MetaCLIP | 20687 | 50417 | | 维基通用1K | MetaCLIP | 65502 | 6448 | | 维基食品与饮品 | MetaCLIP | 13951 | 9825 | | 维基运动器材 | MetaCLIP | 12166 | 5075 |

提供机构：

maas

创建时间：

2025-11-20

搜集汇总

数据集介绍