LifeWatch observatory data: phytoplankton annotated trainingset by FlowCam imaging in the Belgian Part of the North Sea
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10554844
下载链接
链接失效反馈官方服务:
资源简介:
Training dataset
The images were collected in the framework of the Belgian Lifewatch Research Infrastructure. During multidisciplinary campaigns, a number of fixed stations in the Belgian Part of the North Sea (BPNS) are visited on a monthly (onshore stations) or seasonal (offshore stations) basis. Samples are taken using a 55µm mesh size Apstein net and fixed in Lugol's iodine solution. In the lab, the samples are processed using a VS-4 FlowCAM model at 4X magnification targeting a particle size range of 55-300µm. The identification of the image data is done with the use of a CNN and followed by a manual validation step. Since May 2017, this dataset has provided micro- and phytoplankton observations, mainly covering diatoms, dinoflagellates and cilliates, for the Belgian Part of the North Sea (BPNS).
This dataset comprises a trainings datasplit of 337,613 images distributed across 95 classes, with each class containing a minimum of 100 and a maximum of 10,000 images. The goal of this dataset is to be able to facilitate model training, here we have organized the data into a standard split, with 80% allocated for training, 10% for validation, and another 10% for testing purposes. This dataset structure ensures a balanced representation and supports scientific rigor in subsequent analyses.
Technical details
Data preprocessing
Raw FlowCam output data is fully processed using in-house datapipelines, the VisualSpreadsheet software is only used for data acquisition during the lab run of the sample. Raw images and binary images are never saved during the FlowCam run, we only work on the image collages saved at the end of the run. Single images are cut from these collages using each image coordinates width and height pulled from the .lst file using in-house python code. The background of the images is not removed. These images are then predicted and annotated in-house at VLIZ.
Data splitting
The training dataset is 80% used for training, 10% for validation and 10% for prediction.
Classes, labels and annotations
The dataset comprises 337,613 images distributed across 95 classes, with each class containing a minimum of 100 and a maximum of 10,000 images. Taxonomic coverage of the dataset comprises mainly of diatoms, dinoflagellates and cilliates, but to a lesser extent also zooplankton and other protists.
Parameters
The images are read using cv2.imread and the values are used as parameters.
Data sources
Images are collected during the monthly monitoring of phytoplankton communities in the Belgian Part of the North Sea during the LifeWatch multidisciplinary campaigns by FlowCam VS-4 benchmodel (Fluid Imaging Technologies, Yarmouth, Maine, U.S.A.).
Data quality
All images are predicted and subsequently manually validated to ensure the quality of the trainingset.
Image resolution
The size range imaged is 55-300µm. Images are acquired using a Sony XCD SC90 digital gray-scale camera. Images are during training of CNN resized to 100px by 100px.
Spatial coverage
The data comes from a number of fixed stations in the Belgian Part of the North Sea (BPNS).
Nine stations onshore are visited monthly:
Station
Longitude
Latitude
130
2.90535
51.27055
780
3.057283
51.471367
330
2.809083
51.434117
230
2.85035
51.308683
710
3.138283
51.441217
215
2.61075
51.274867
ZG02
2.500717
51.33515
120
2.702483
51.186083
700
3.221017
51.377
Eight additional offshore stations are visited seasonally:
Station
Longitude
Latitude
LW01
2.256
51.568667
LW02
2.556
51.8
435
2.790333
51.580667
W07bis
3.012517
51.588033
W08
2.35
51.458333
W09
2.7
51.75
W10
2.416667
51.683333
421
2.45
51.4805
Temporal coverage
The monitoring was initiated in May 2017 and has been running continuously every month.
Contact information
For technical questions about training, you can contact wout.decrop@vliz.be.
For more information on the training dataset and FlowCam, you can contact rune.lagaisse@vliz.be.
训练数据集
本数据集图像采集于比利时生命观测研究基础设施(Belgian Lifewatch Research Infrastructure)框架内。在多学科科考作业期间,研究人员按月度周期(陆上监测站)或季度周期(离岸监测站)对比利时北海海域(Belgian Part of the North Sea, BPNS)内的多个固定站点进行巡查。采样采用孔径为55μm的Apstein网(Apstein net)完成,样品使用卢戈氏碘液(Lugol's iodine solution)进行固定。实验室中,采用放大倍率为4倍的VS-4型FlowCAM(VS-4 FlowCAM)对粒径范围为55~300μm的样品进行处理。图像数据的识别先通过卷积神经网络(Convolutional Neural Network, CNN)完成,随后辅以人工核验步骤。自2017年5月起,本数据集持续为比利时北海海域提供微型浮游植物与浮游植物观测数据,观测类群主要涵盖硅藻(diatoms)、甲藻(dinoflagellates)以及纤毛虫(cilliates)。
本训练数据集包含337,613张图像,共划分为95个类别,每个类别包含的图像数量介于100至10,000张之间。本数据集旨在助力模型训练工作,我们已将数据按照标准划分方式进行拆分:80%用于训练集,10%用于验证集,剩余10%用于测试集。该数据集结构保障了类别的均衡分布,可为后续分析的科学性与严谨性提供支撑。
## 技术细节
### 数据预处理
原始FlowCAM输出数据通过自研数据流水线完成全流程处理,VisualSpreadsheet软件(VisualSpreadsheet)仅用于实验室样品上机阶段的数据采集环节。FlowCAM上机过程中不会保存原始图像与二值图像,仅会留存上机结束时生成的图像拼接图。研究人员通过自研Python代码从.lst格式文件中提取单张图像的宽、高坐标信息,从上述拼接图中裁剪得到单张独立图像。图像背景未做移除处理。上述图像随后在弗拉芒海洋科学研究所(Vlaams Instituut voor de Zee, VLIZ)完成自主的模型预测与标注工作。
### 数据拆分
本训练数据集按80%用于训练、10%用于验证、10%用于预测的比例进行拆分。
### 类别、标签与标注
本数据集共包含337,613张图像,划分为95个类别,每个类别包含的图像数量介于100至10,000张之间。本数据集的分类覆盖范围以硅藻、甲藻与纤毛虫为主,同时涵盖少量浮游动物及其他原生生物。
### 参数
图像通过cv2.imread函数读取,其像素值被用作模型训练参数。
### 数据来源
本数据集图像采集自LifeWatch多学科科考作业期间开展的比利时北海海域浮游植物群落月度监测工作,采用的设备为VS-4型FlowCAM台式成像仪(Fluid Imaging Technologies, 美国缅因州雅茅斯市)。
### 数据质量
所有图像均先经过模型预测,随后辅以人工核验,以保障训练数据集的质量。
### 图像分辨率
本数据集成像的粒径范围为55~300μm,图像由Sony XCD SC90型数字灰度相机(Sony XCD SC90)采集。在卷积神经网络训练阶段,所有图像均被统一调整为100px×100px的尺寸。
### 空间覆盖范围
本数据集的数据来源于比利时北海海域内的多个固定监测站点。
月度巡查的陆上监测站共9个:
| 监测站编号 | 经度 | 纬度 |
| --- | --- | --- |
| 130 | 2.90535 | 51.27055 |
| 780 | 3.057283 | 51.471367 |
| 330 | 2.809083 | 51.434117 |
| 230 | 2.85035 | 51.308683 |
| 710 | 3.138283 | 51.441217 |
| 215 | 2.61075 | 51.274867 |
| ZG02 | 2.500717 | 51.33515 |
| 120 | 2.702483 | 51.186083 |
| 700 | 3.221017 | 51.377 |
季度巡查的离岸监测站共8个:
| 监测站编号 | 经度 | 纬度 |
| --- | --- | --- |
| LW01 | 2.256 | 51.568667 |
| LW02 | 2.556 | 51.8 |
| 435 | 2.790333 | 51.580667 |
| W07bis | 3.012517 | 51.588033 |
| W08 | 2.35 | 51.458333 |
| W09 | 2.7 | 51.75 |
| W10 | 2.416667 | 51.683333 |
| 421 | 2.45 | 51.4805 |
### 时间覆盖范围
本监测工作始于2017年5月,此后每月持续开展。
### 联系方式
若有训练相关的技术问题,请联系wout.decrop@vliz.be。
若需了解训练数据集及FlowCAM的更多信息,请联系rune.lagaisse@vliz.be。
创建时间:
2024-04-03



