smithsonian_butterflies_subset
收藏魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/huggan/smithsonian_butterflies_subset
下载链接
链接失效反馈官方服务:
资源简介:
This a subset of "ceyda/smithsonian_butterflies" dataset with additional processing done to train the "ceyda/butterfly_gan" model.
The preprocessing includes:
- Adding "sim_score" to images with CLIP model using "pretty butterfly","one butterfly","butterfly with open wings","colorful butterfly"
- Removing butterflies with the same name(species)
- Limiting only to the top 1000 images
- Removing the background (doing another sim_scoring after bg removal did visually worse so didn't do it)
- Detecting contours
- Cropping to the bounding box of the contour with the largest area
- Converting back to RGB
本数据集为"ceyda/smithsonian_butterflies"数据集的子集,经过额外预处理后用于训练"ceyda/butterfly_gan"模型。
预处理流程包括:
- 使用CLIP模型(CLIP),结合“美丽的蝴蝶”“单只蝴蝶”“翅膀展开的蝴蝶”“色彩艳丽的蝴蝶”作为提示词,为图像添加相似度评分(sim_score)
- 移除同物种(同名)的蝴蝶样本
- 仅保留前1000张图像
- 移除图像背景(移除背景后再次进行相似度评分的视觉效果较差,故未执行该步骤)
- 检测图像轮廓
- 裁剪至面积最大的轮廓对应的边界框
- 转换为RGB色彩模式
提供机构:
maas
创建时间:
2025-10-13



