Dataset of trash and water segmentations in riverine environments
收藏4TU.ResearchData2024-09-25 更新2026-04-23 收录
下载链接:
https://data.4tu.nl/datasets/90d13261-b0fe-444a-b408-c5a63db3d887/1
下载链接
链接失效反馈官方服务:
资源简介:
<br>***General Introduction***This dataset contains images of trash patches in riverine environments and corresponding segmentation of the trash, possible barriers and water.It is being made public both to act as supplementary data for publications of M. Don in her conference paper 'Foundation Model or Finetune? Evaluation offew-shot semantic segmentation for river pollution' and in order for other researchers to use this data in their own work.The data in this dataset was collected as part of the operations of The Ocean Cleanup between 2020 and 2023 and comes from multiple locations around the world and contains 300 images together with annotations of trash.<br>***Purpose of the data***The data was collected and annotated to research if estimations of trash loads can be made based on segmentation of that trash locations around riverswith the goal of assessing debris loads and debris loads dynamics in these rivers as well as assessing efficacy of the barriers and extraction operations.<br>***Description of the data in this data set***The data in the dataset has been organized in three folders:<br>-images. This contains 6 subfolders of the 6 locations from which the images are collected. each of these folders contains roughly 50 images in jpg format. There are mixed 5MP and 12MP camera images in different resolution from a range of timestamps per location. - the name of the folder identifies the location, using identifiers 1-6 - the name of the image is set as: <unix timestamp>_<iso timestamp>_<device serial>.jpg-annotations. This folder contains the annotations in COCO format. It contains two files: - annotation.json: all annotations in coco format. - split_mapping.json: a file denoting how the dataset is split amongst the different train/test splits used in the paper by M. Don. The GitHub repository corresponding to the paper contains instructions on generating these splits.-pretrained_yolo_models. with the resulting yolo ultralytics models used in the conference paper. - different_train_sizes: this contains all models trained with the different training/validation splits. naming train_training%.pt. so train10.pt means 10% of the dataset is used for training. - generalization_loc6: models trained on loc1-loc5 and finetuned to location 6 according to train<nr_images>_epochs<nr_epochs>.pt - trained_one_location: the models trained on only data from their respective locations in 80/20 split.
**数据集总体介绍**
本数据集收录了河流环境中的垃圾漂浮带图像,以及针对垃圾、潜在拦截屏障与水体的分割标注。本次公开该数据集,一方面可作为M. Don发表的会议论文《基础模型(Foundation Model)还是微调?少样本(few-shot)语义分割在河流污染场景中的评估》的补充数据,另一方面可供其他研究人员将其用于自身研究工作。本数据集的数据采集自“海洋清理(The Ocean Cleanup)”2020至2023年的作业流程,采集范围覆盖全球多个区域,共包含300张带垃圾标注的图像。
**数据采集目的**
本次采集并标注该数据集,旨在探究能否基于河流周边垃圾区域的分割结果估算垃圾载量,以实现两大目标:一是评估这些河流中的 debris堆积量及其动态变化,二是评估拦截屏障与垃圾提取作业的实施效果。
**数据集内容说明**
本数据集的数据被整理为三个文件夹:
- **images(图像文件夹)**:包含6个子文件夹,分别对应图像采集的6个区域,每个子文件夹约包含50张JPG格式图像。所有图像由不同设备采集,分辨率涵盖5MP与12MP,且每个区域的图像对应多个时间戳。
- 子文件夹以1至6的编号标识对应采集区域。
- 图像文件命名格式为:<Unix时间戳>_<ISO时间戳>_<设备序列号>.jpg
- **annotations(标注文件夹)**:包含COCO(Common Objects in Context)格式的标注文件,内含两个文件:
- `annotation.json`:包含全部COCO格式的标注数据。
- `split_mapping.json`:用于说明M. Don的论文中所采用的不同训练/测试集划分方案。该论文对应的GitHub仓库中包含了生成此类划分的相关说明。
- **pretrained_yolo_models(预训练YOLO模型文件夹)**:包含该会议论文中使用的Ultralytics YOLO预训练模型:
- `different_train_sizes`:包含采用不同训练/验证集划分比例训练得到的模型,模型命名格式为`train_训练占比%.pt`,例如`train10.pt`代表使用数据集10%的数据作为训练集。
- `generalization_loc6`:包含在loc1至loc5区域的数据上预训练、再针对loc6区域进行微调的模型,模型命名格式为`train_<训练样本数>_epochs_<训练轮数>.pt`。
- `trained_one_location`:包含仅使用对应单个区域的数据(按80/20比例划分训练/测试集)训练得到的模型。
提供机构:
Don, Marga; Guillen Cebrian, Blanca
创建时间:
2024-09-25



