ZooScanNet: plankton images captured with the ZooScan
收藏DataCite Commons2026-03-13 更新2025-04-16 收录
下载链接:
https://www.seanoe.org/data/00446/55741/
下载链接
链接失效反馈官方服务:
资源简介:
Plankton was sampled with various nets, from bottom or 500m depth to the surface, in many oceans of the world. Samples were imaged with a ZooScan. The full images were processed with ZooProcess which generated regions of interest (ROIs) around each individual object and a set of associated features measured on the object (see Gorsky et al 2010 for more information). The same objects were re-processed to compute features with the scikit-image toolbox http://scikit-image.org. The 1,451,745 resulting objects were sorted by a limited number of operators, following a common taxonomic guide, into 98 taxa, using the web application EcoTaxa http://ecotaxa.obs-vlfr.fr. For the purpose of training machine learning classifiers, the images in each class were split into training, validation, and test sets, with proportions 70%, 15% and 15%.
The folder ZooScanNet_data.tar contains :
taxa.csv.gz
Table of the classification of each object in the dataset, with columns :
- objid: unique object identifier in EcoTaxa (integer number)
- taxon_level1: taxonomic name corresponding to the level 1 classification
- lineage_level1: taxonomic lineage corresponding to the level 1 classification
- taxon_level2: name of the taxon corresponding to the level 2 classification
- plankton: if the object is a plankton or not (boolean)
- set: class of the image corresponding to the taxon (train : training, val : validation, or test)
- img_path: local path of the image corresponding to the taxon (of level 1), named according to the object id
features_native.csv.gz
Table of metadata of each object including the different features processed by ZooProcess. All features are computed on the object only, not the background. All area/length measures are in pixels. All grey levels are in encoded in 8 bits (0=black, 255=white). With columns:
- objid: unique object identifier in EcoTaxa (integer number) And 48 features:
- area
- mean
- stddev
- mode
- min/max
- perim.
- width,height
- major,minor
- circ.
- feret
- intden
- median
- skew,kurt
- %area
- area_exc
- fractal
- skelarea
- slope
- histcum1,2,3
- nb1,2,3
- symetrieh,symetriev
- symetriehc,symetrievc
- convperim,convarea
- fcons
- thickr:
- esd
- elongation
- range
- centroids
- sr
- perimareaexc
- feretareaexc
- perimferet/perimmajor
- circex
- cdexc
See the “ZooScan” sheet - OBJECT metadata, annotation and measurements - , at https://doi.org/10.5281/zenodo.14704250 for definitions.
features_skimage.csv.gz
Table of morphological features recomputed with skimage.measure.regionprops on the ROIs produced by ZooProcess. See http://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops for documentation.
inventory.tsv
Tree view of the taxonomy and number of images in each taxon, displayed as text. With columns :
- lineage_level1: taxonomic lineage corresponding to the level 1 classification
- taxon_level1: name of the taxon corresponding to the level 1 classification
- n: number of objects in each taxon class
2. Second folder ZooScanNet_imgs.tar contains :
imgs
Directory containing images of each object, named according to the object id objid and sorted in subdirectories according to their taxon.
3. And :
map.png
Map of the sampling locations, to give an idea of the diversity sampled in this dataset.
本数据集的浮游生物样本采用多种网具采集,采样覆盖全球多片海域,采样深度范围为海底或500米至海面。样本通过ZooScan进行成像。完整图像经ZooProcess处理,可在每个独立目标周围生成感兴趣区域(Regions of Interest,ROIs),并提取该目标的一系列关联特征(详细信息参见Gorsky等人2010年的研究)。研究人员还使用scikit-image工具包(http://scikit-image.org)对同一批目标重新处理以计算特征。最终得到的1,451,745个目标对象,由少量操作人员依据通用分类学指南,通过Web应用EcoTaxa(http://ecotaxa.obs-vlfr.fr)划分为98个分类单元。为训练机器学习分类器,每个分类的图像按70%、15%、15%的比例划分为训练集、验证集与测试集。
ZooScanNet_data.tar压缩包包含以下文件:
1. taxa.csv.gz
数据集内各目标对象的分类表,字段如下:
- "objid":EcoTaxa中的唯一目标标识符(整数)
- "taxon_level1":一级分类对应的分类单元名称
- "lineage_level1":一级分类对应的分类学谱系
- "taxon_level2":二级分类对应的分类单元名称
- "plankton":目标是否为浮游生物(布尔值)
- "set":图像所属的数据集类别(train:训练集,val:验证集,test:测试集)
- "img_path":对应一级分类目标的图像本地路径,文件名以目标ID命名
2. features_native.csv.gz
各目标对象的元数据表,包含ZooProcess处理得到的各类特征。所有特征仅针对目标对象计算,不包含背景。所有面积/长度度量单位均为像素,所有灰度值以8位编码(0代表黑色,255代表白色)。字段包括:
- "objid":EcoTaxa中的唯一目标标识符(整数),以及48项特征:
- "area":面积
- "mean":平均灰度值
- "stddev":灰度标准差
- "mode":灰度众数
- "min/max":灰度最小值/最大值
- "perim.":周长
- "width,height":宽度、高度
- "major,minor":长轴、短轴长度
- "circ.":圆形度
- "feret":费雷特直径
- "intden":积分光密度
- "median":灰度中位数
- "skew,kurt":偏度、峰度
- "%area":面积占比
- "area_exc":实体面积
- "fractal":分形维数
- "skelarea":骨架面积
- "slope":灰度斜率
- "histcum1,2,3":累积直方图1、2、3
- "nb1,2,3":区域计数1、2、3
- "symetrieh,symetriev":水平对称性、垂直对称性
- "symetriehc,symetrievc":水平中心对称性、垂直中心对称性
- "convperim,convarea":凸包周长、凸包面积
- "fcons":紧致性参数
- "thickr":厚度比率
- "esd":等效球直径
- "elongation":伸长率
- "range":灰度范围
- "centroids":质心坐标
- "sr":形状比
- "perimareaexc":周长与实体面积比
- "feretareaexc":费雷特直径与实体面积比
- "perimferet/perimmajor":费雷特直径与长轴比值
- "circex":修正圆形度
- "cdexc":凸缺陷参数
详细定义可参见https://doi.org/10.5281/zenodo.14704250中的"ZooScan"工作表——"OBJECT metadata, annotation and measurements"部分。
3. features_skimage.csv.gz
基于ZooProcess生成的ROIs,通过skimage.measure.regionprops重新计算的形态特征表。详细文档参见http://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops。
4. inventory.tsv
分类学谱系与各分类单元图像数量的树形视图文本文件,字段如下:
- "lineage_level1":一级分类对应的分类学谱系
- "taxon_level1":一级分类对应的分类单元名称
- "n":各分类单元类别的目标对象数量
第二部分压缩包ZooScanNet_imgs.tar包含:
imgs/
存储各目标对象图像的目录,文件名以目标ID(objid)命名,并按照其所属分类单元存入对应子目录。
此外还包含:
map.png
采样位置分布图,用于直观展示本数据集覆盖的采样多样性。
提供机构:
SEANOE
创建时间:
2018-07-05
搜集汇总
数据集介绍

背景与挑战
背景概述
ZooScanNet是一个大规模的浮游生物图像数据集,包含1995年至2019年间在全球海洋中采集的1,451,745个对象图像,使用ZooScan设备成像并经过人工分类为98个分类群。该数据集专门为机器学习分类任务设计,已分割为训练、验证和测试集,并提供了图像、分类表和形态特征元数据,适用于浮游生物研究和计算机视觉应用。
以上内容由遇见数据集搜集并总结生成



