five

ZooScanNet: plankton images captured with the ZooScan

收藏
DataCite Commons2026-03-13 更新2025-04-16 收录
下载链接:
https://www.seanoe.org/data/00446/55741/
下载链接
链接失效反馈
官方服务:
资源简介:
Plankton was sampled with various nets, from bottom or 500m depth to the surface, in many oceans of the world. Samples were imaged with a ZooScan. The full images were processed with ZooProcess which generated regions of interest (ROIs) around each individual object and a set of associated features measured on the object (see Gorsky et al 2010 for more information). The same objects were re-processed to compute features with the scikit-image toolbox http://scikit-image.org. The 1,451,745 resulting objects were sorted by a limited number of operators, following a common taxonomic guide, into 98 taxa, using the web application EcoTaxa http://ecotaxa.obs-vlfr.fr. For the purpose of training machine learning classifiers, the images in each class were split into training, validation, and test sets, with proportions 70%, 15% and 15%. The folder ZooScanNet_data.tar contains : taxa.csv.gz Table of the classification of each object in the dataset, with columns : - objid: unique object identifier in EcoTaxa (integer number) - taxon_level1: taxonomic name corresponding to the level 1 classification - lineage_level1: taxonomic lineage corresponding to the level 1 classification - taxon_level2: name of the taxon corresponding to the level 2 classification  - plankton: if the object is a plankton or not (boolean) - set: class of the image corresponding to the taxon (train : training, val : validation, or test) - img_path: local path of the image corresponding to the taxon (of level 1), named according to the object id features_native.csv.gz Table of metadata of each object including the different features processed by ZooProcess. All features are computed on the object only, not the background. All area/length measures are in pixels. All grey levels are in encoded in 8 bits (0=black, 255=white). With columns: - objid: unique object identifier in EcoTaxa (integer number) And 48 features: - area - mean - stddev - mode - min/max - perim. - width,height  - major,minor - circ. - feret - intden - median - skew,kurt - %area - area_exc - fractal - skelarea - slope - histcum1,2,3 - nb1,2,3 - symetrieh,symetriev - symetriehc,symetrievc - convperim,convarea - fcons - thickr:  - esd - elongation - range - centroids - sr - perimareaexc - feretareaexc - perimferet/perimmajor - circex - cdexc See the “ZooScan” sheet - OBJECT metadata, annotation and measurements - , at https://doi.org/10.5281/zenodo.14704250 for definitions. features_skimage.csv.gz Table of morphological features recomputed with skimage.measure.regionprops on the ROIs produced by ZooProcess. See http://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops for documentation. inventory.tsv Tree view of the taxonomy and number of images in each taxon, displayed as text. With columns : - lineage_level1: taxonomic lineage corresponding to the level 1 classification - taxon_level1: name of the taxon corresponding to the level 1 classification - n: number of objects in each taxon class          2. Second folder ZooScanNet_imgs.tar contains : imgs Directory containing images of each object, named according to the object id objid and sorted in subdirectories according to their taxon.         3. And : map.png Map of the sampling locations, to give an idea of the diversity sampled in this dataset.

本数据集的浮游生物样本采用多种网具采集,采样覆盖全球多片海域,采样深度范围为海底或500米至海面。样本通过ZooScan进行成像。完整图像经ZooProcess处理,可在每个独立目标周围生成感兴趣区域(Regions of Interest,ROIs),并提取该目标的一系列关联特征(详细信息参见Gorsky等人2010年的研究)。研究人员还使用scikit-image工具包(http://scikit-image.org)对同一批目标重新处理以计算特征。最终得到的1,451,745个目标对象,由少量操作人员依据通用分类学指南,通过Web应用EcoTaxa(http://ecotaxa.obs-vlfr.fr)划分为98个分类单元。为训练机器学习分类器,每个分类的图像按70%、15%、15%的比例划分为训练集、验证集与测试集。 ZooScanNet_data.tar压缩包包含以下文件: 1. taxa.csv.gz 数据集内各目标对象的分类表,字段如下: - "objid":EcoTaxa中的唯一目标标识符(整数) - "taxon_level1":一级分类对应的分类单元名称 - "lineage_level1":一级分类对应的分类学谱系 - "taxon_level2":二级分类对应的分类单元名称 - "plankton":目标是否为浮游生物(布尔值) - "set":图像所属的数据集类别(train:训练集,val:验证集,test:测试集) - "img_path":对应一级分类目标的图像本地路径,文件名以目标ID命名 2. features_native.csv.gz 各目标对象的元数据表,包含ZooProcess处理得到的各类特征。所有特征仅针对目标对象计算,不包含背景。所有面积/长度度量单位均为像素,所有灰度值以8位编码(0代表黑色,255代表白色)。字段包括: - "objid":EcoTaxa中的唯一目标标识符(整数),以及48项特征: - "area":面积 - "mean":平均灰度值 - "stddev":灰度标准差 - "mode":灰度众数 - "min/max":灰度最小值/最大值 - "perim.":周长 - "width,height":宽度、高度 - "major,minor":长轴、短轴长度 - "circ.":圆形度 - "feret":费雷特直径 - "intden":积分光密度 - "median":灰度中位数 - "skew,kurt":偏度、峰度 - "%area":面积占比 - "area_exc":实体面积 - "fractal":分形维数 - "skelarea":骨架面积 - "slope":灰度斜率 - "histcum1,2,3":累积直方图1、2、3 - "nb1,2,3":区域计数1、2、3 - "symetrieh,symetriev":水平对称性、垂直对称性 - "symetriehc,symetrievc":水平中心对称性、垂直中心对称性 - "convperim,convarea":凸包周长、凸包面积 - "fcons":紧致性参数 - "thickr":厚度比率 - "esd":等效球直径 - "elongation":伸长率 - "range":灰度范围 - "centroids":质心坐标 - "sr":形状比 - "perimareaexc":周长与实体面积比 - "feretareaexc":费雷特直径与实体面积比 - "perimferet/perimmajor":费雷特直径与长轴比值 - "circex":修正圆形度 - "cdexc":凸缺陷参数 详细定义可参见https://doi.org/10.5281/zenodo.14704250中的"ZooScan"工作表——"OBJECT metadata, annotation and measurements"部分。 3. features_skimage.csv.gz 基于ZooProcess生成的ROIs,通过skimage.measure.regionprops重新计算的形态特征表。详细文档参见http://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops。 4. inventory.tsv 分类学谱系与各分类单元图像数量的树形视图文本文件,字段如下: - "lineage_level1":一级分类对应的分类学谱系 - "taxon_level1":一级分类对应的分类单元名称 - "n":各分类单元类别的目标对象数量 第二部分压缩包ZooScanNet_imgs.tar包含: imgs/ 存储各目标对象图像的目录,文件名以目标ID(objid)命名,并按照其所属分类单元存入对应子目录。 此外还包含: map.png 采样位置分布图,用于直观展示本数据集覆盖的采样多样性。
提供机构:
SEANOE
创建时间:
2018-07-05
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
ZooScanNet是一个大规模的浮游生物图像数据集,包含1995年至2019年间在全球海洋中采集的1,451,745个对象图像,使用ZooScan设备成像并经过人工分类为98个分类群。该数据集专门为机器学习分类任务设计,已分割为训练、验证和测试集,并提供了图像、分类表和形态特征元数据,适用于浮游生物研究和计算机视觉应用。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作