Dataset of trash and water segmentations in riverine environments
收藏DataCite Commons2024-09-25 更新2024-10-19 收录
下载链接:
https://data.4tu.nl/datasets/90d13261-b0fe-444a-b408-c5a63db3d887
下载链接
链接失效反馈官方服务:
资源简介:
<br>***General Introduction***This dataset contains images of trash patches in riverine environments and corresponding segmentation of the trash, possible barriers and water.It is being made public both to act as supplementary data for publications of M. Don in her conference paper 'Foundation Model or Finetune? Evaluation offew-shot semantic segmentation for river pollution' and in order for other researchers to use this data in their own work.The data in this dataset was collected as part of the operations of The Ocean Cleanup between 2020 and 2023 and comes from multiple locations around the world and contains 300 images together with annotations of trash.<br>***Purpose of the data***The data was collected and annotated to research if estimations of trash loads can be made based on segmentation of that trash locations around riverswith the goal of assessing debris loads and debris loads dynamics in these rivers as well as assessing efficacy of the barriers and extraction operations.<br>***Description of the data in this data set***The data in the dataset has been organized in three folders:<br>-images. This contains 6 subfolders of the 6 locations from which the images are collected. each of these folders contains roughly 50 images in jpg format. There are mixed 5MP and 12MP camera images in different resolution from a range of timestamps per location. - the name of the folder identifies the location, using identifiers 1-6 - the name of the image is set as: <unix timestamp>_<iso timestamp>_<device serial>.jpg-annotations. This folder contains the annotations in COCO format. It contains two files: - annotation.json: all annotations in coco format. - split_mapping.json: a file denoting how the dataset is split amongst the different train/test splits used in the paper by M. Don. The GitHub repository corresponding to the paper contains instructions on generating these splits.-pretrained_yolo_models. with the resulting yolo ultralytics models used in the conference paper. - different_train_sizes: this contains all models trained with the different training/validation splits. naming train_training%.pt. so train10.pt means 10% of the dataset is used for training. - generalization_loc6: models trained on loc1-loc5 and finetuned to location 6 according to train<nr_images>_epochs<nr_epochs>.pt - trained_one_location: the models trained on only data from their respective locations in 80/20 split.
**数据集概述**
本数据集包含河流环境中的垃圾漂浮带图像,以及对应垃圾、可能存在的拦污设施与水体的分割标注。本数据集公开的目的,一是作为M. Don在其会议论文《基于基础模型还是微调?河流污染少样本语义分割的评估》的补充发表数据,二是供其他研究人员在自身研究中使用。本数据集的数据采集于2020年至2023年间“海洋清理计划(The Ocean Cleanup)”的作业期间,采集自全球多个地点,共包含300张带有垃圾标注的图像。
**数据采集目的**
本数据经采集与标注,旨在研究能否通过分割河流周边的垃圾点位来估算垃圾负荷,以期评估相关河流中的垃圾负载量及其动态变化,并同时评估拦污设施与垃圾打捞作业的实际效能。
**数据集内容说明**
本数据集的数据按三个文件夹进行组织:
- `images`(图像文件夹):包含6个子文件夹,对应采集图像的6个地点,每个子文件夹约包含50张JPEG(JPG)格式的图像。图像来自不同拍摄时间戳,分辨率涵盖5MP与12MP两种。
- 文件夹名称以1-6的标识符标识对应采集地点;
- 图像命名格式为:`<unix时间戳>_<iso时间戳>_<设备序列号>.jpg`。
- `annotations`(标注文件夹):包含COCO(Common Objects in Context)格式的标注文件,内含两个文件:
- `annotation.json`:全部COCO格式的标注数据;
- `split_mapping.json`:用于说明该数据集在M. Don的会议论文中所采用的不同训练/测试集划分方式。该论文对应的GitHub仓库中包含生成此类划分的相关说明。
- `pretrained_yolo_models`(预训练YOLO模型文件夹):包含会议论文中使用的Ultralytics YOLO训练所得模型:
- `different_train_sizes`:包含使用不同训练/验证集划分训练得到的所有模型,文件命名格式为`train_训练占比%.pt`,例如`train10.pt`代表使用数据集的10%作为训练集;
- `generalization_loc6`:包含在地点1至5上训练、并针对地点6进行微调得到的模型,文件命名格式为`train<训练图像数>_epochs<训练轮数>.pt`;
- `trained_one_location`:仅使用对应单个地点的数据(按80/20比例划分训练/验证集)训练得到的模型。
提供机构:
4TU.ResearchData
创建时间:
2024-09-25
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含河流环境中垃圾和水体的图像及分割标注,共300张图像,来自全球多个地点,标注采用COCO格式。数据集还提供预训练的YOLO模型,用于支持垃圾负荷估计和分割研究。
以上内容由遇见数据集搜集并总结生成



