Prisma-Multimodal/segmented-imagenet1k-subset
收藏Segmented ImageNet-1K Subset
概述
- 数据集名称: Segmented ImageNet-1K Subset
- 数据集大小: 12,000 张图片
- 10,000 张来自 ImageNet-1K 训练集
- 1,000 张来自测试集
- 1,000 张来自验证集
- 注释类型: 实例分割注释(类别、边界框和掩码)
- 注释生成工具: Grounded Segment Anything
数据组织
-
图片组织结构:
images/ train_images/ val_images/ test_images/
-
掩码组织结构:
masks/ train_masks/ val_masks/ test_masks/
-
注释文件:
train.json,val.json,test.json- 包含图片路径、分数、边界框、标签和掩码路径信息
示例注释文件内容
json { "image": "images/val_images/ILSVRC2012_val_00000025_n01616318.JPEG", "scores": [0.5, 0.44, 0.43, 0.28], "boxes": [[149, 117, 400, 347], [2, 2, 498, 497], [148, 115, 401, 349], [2, 2, 498, 497]], "labels": ["bird", "dirt field", "vulture", "land"], "masks": ["masks/val_masks/ILSVRC2012_val_00000025_n01616318_00.png", "masks/val_masks/ILSVRC2012_val_00000025_n01616318_01.png", "masks/val_masks/ILSVRC2012_val_00000025_n01616318_02.png", "masks/val_masks/ILSVRC2012_val_00000025_n01616318_03.png"] }
数据加载器示例
python class PatchDataset(Dataset): def init(self, dataset, patch_size=16, width=224, height=224): self.dataset = dataset self.transform = transforms.Compose([ transforms.Resize((width, height)), transforms.ToTensor(), ]) self.patch_size = patch_size self.width = width self.height = height
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
item = self.dataset[idx]
image = self.transform(item[image])
masks = item[masks]
labels = item[labels]
num_patches = self.width // self.patch_size
label_array = [[[] for _ in range(num_patches)] for _ in range(num_patches)]
for mask, label in zip(masks, labels):
mask = mask.resize((self.width, self.height))
mask_array = np.array(mask) > 0
reduced_mask = self.reduce_mask(mask_array)
for i in range(num_patches):
for j in range(num_patches):
if reduced_mask[i, j]:
label_array[i][j].append(label)
return image, label_array
def reduce_mask(self, mask):
new_h = mask.shape[0] // self.patch_size
new_w = mask.shape[1] // self.patch_size
reduced_mask = np.zeros((new_h, new_w), dtype=bool)
for i in range(new_h):
for j in range(new_w):
patch = mask[i*self.patch_size:(i+1)*self.patch_size, j*self.patch_size:(j+1)*self.patch_size]
reduced_mask[i, j] = np.any(patch)
return reduced_mask
引用
bibtex @misc{segmented_imagenet1k_subset_2024, author = {ViT-Prisma Contributors}, title = {Segmented ImageNet-1k Subset}, url = {https://huggingface.co/datasets/Prisma-Multimodal/segmented-imagenet1k-subset}, version = {1.0.0}, date = {2024-04-02}, }




