Detection-and-Tracking of Dolphins of Aerial Videos and Images
收藏Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://zenodo.org/record/4775125
下载链接
链接失效反馈官方服务:
资源简介:
This Project consists of two datasets, both of aerial images and videos of dolphins, being taken by drones. The data was captured from few places (Italy and Israel coast lines). The aim of the project is to examine automated dolphins detection and tracking from aerial surveys. The project description, details and results are presented in the paper (link to the paper). Each dataset was organized and set for a different phase of the project. Each dataset is located in a different zip file: 1. Detection - Detection.zip 2. Tracking - Tracking.zip Further information about the datasets' content and annotation format is below. * In aim to watch each file content, use the preview option, in addition a description appears later on this section. Detection Dataset This dataset contains 1125 aerial images, while an image can contain several dolphins. The detection phase of the project is done using RetinaNet, supervised deep learning based algorithm, with the implementation of Keras RetinaNet. Therefore, the data was divided into three parts - Train, Validation and Test. The relations is 70%, 15%, 15% respectively. The annotation format follows the requested format of that implementation (Keras RetinaNet). Each object, which is a dolphin, is annotated as a bounding box coordinates and a class. For this project, the dolphins were not distinguished into species, therefore, a dolphin object is annotated as a bounding box, and classified as a 'Dolphin'. Detection zip file includes: A folder for each - Train, Validation and Test subsets, which includes the images An annotations CSV file for each subset A class mapping csv file (one for all the subsets). *The annotation format is detailed in Annotation section. Detection zip file content: Detection
|——————train_set (images)
|——————train_set.csv
|——————validation_set (images)
|——————train_set.csv
|——————test_set (images)
|——————train_set.csv
└——————class_mapping.csv Tracking This dataset contains 5 short videos (10-30 seconds), which were trimmed from a longer aerial videos, captured from a drone. The tracking phase of the project is done using two metrics: VIAME application, using the tracking feature Re3: Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects, by Daniel Gordon. For this project, the author's Tensorflow implementation is being used Both metrics demand the videos' frames sequence as an input. Therefore, the videos' frames were extracted. The first frame was annotated manually for initialization, and the algorithms track accordingly. Same as the Detection dataset, each frame can includes several objects (dolphins). For annotation consistency, the videos' frames sequences were annotated similar to the Detection Dataset above, (details can be found in Annotation section). Each video's frames annotations separately. Therefore, Tracking zip file contains a folder for each video (5 folders in total), named after the video's file name. Each video folder contains: Frames sequence directory, which includes the extracted frames of the video An annotations CSV file A class mapping CSV file The original video in MP4 format The examined videos description and details are displayed in 'Videos Description.xlsx' file. Use the preview option for displaying its content. Tracking zip file content: Tracking
|——————DJI_0195_trim_0015_0045
| └——————frames (images)
| └——————annotations_DJI_0195_trim_0015_0045.csv
| └——————class_mapping_DJI_0195_trim_0015_0045.csv
| └——————DJI_0195_trim_0015_0045.MP4
|——————DJI_0395_trim_0010_0025
| └——————frames (images)
| └——————annotations_DJI_0395_trim_0010_0025.csv
| └——————class_mapping_DJI_0395_trim_0010_0025.csv
| └——————DJI_0195_trim_0015_0045.MP4
|——————DJI_0395_trim_00140_00150
| └——————frames (images)
| └——————annotations_DJI_0395_trim_00140_00150.csv
| └——————class_mapping_DJI_0395_trim_00140_00150.csv
| └——————DJI_0395_trim_00140_00150.MP4
|——————DJI_0395_trim_0055_0085
| └——————frames (images)
| └——————annotations_DJI_0395_trim_0055_0085.csv
| └——————class_mapping_DJI_0395_trim_0055_0085.csv
| └——————DJI_0395_trim_0055_0085.MP4
└——————HighToLow_trim_0045_0070
└—————frames (images)
└—————annotations_HighToLow_trim_0045_0070.csv
└—————class_mapping_HighToLow_trim_0045_0070.csv
└—————HighToLow_trim_0045_0070.MP4 Annotations format Both datasets have similar annotation format which is described below. The data annotation format, of both datasets, follows the requested format of Keras RetinaNet Implementation, which was used for training in the Dolphins Detection phase of the project. Each object (dolphin) is annotated by a bounding box left-top and right-bottom coordinates and a class. Each image or frame can includes several objects. All data was annotated using Labelbox application. For each subset (Train, Validation and Test of Detection dataset, and each video of Tracking Dataset) there are two corresponded CSV files: Annotations CSV file Class mapping CSV file Each line in the Annotations CSV file contains an annotation (bounding box) in an image or frame. The format of each line of the CSV annotation is: path/to/image.jpg - a path to the image/frame x1, y1 - image coordinates of the left upper corner of the bounding box x2, y2 - image coordinates of the right bottom corner of the bounding box class_name - class name of the annotated object path/to/image.jpg,x1,y1,x2,y2,class_name An example from `train_set.csv`: .\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,506,644,599,681,Dolphin
.\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,394,754,466,826,Dolphin
.\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,613,699,682,781,Dolphin
.\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,528,354,586,443,Dolphin
.\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,633,250,723,307,Dolphin This defines a dataset with 2 images: `1146_20170730101_ce1_sc_GOPR3047 103.jpg` which contains 2 objects classified as 'Dolphin' `1146_20170730101_ce1_sc_GOPR3047 104.jpg` which contains 3 objects classified as 'Dolphin' Each line in the Class Mapping CSV file contains a mapping: class_name,id An example: Dolphin,0
本项目包含两套数据集,均为无人机拍摄的海豚航拍图像与视频。数据采集自两处沿海区域:意大利与以色列沿海。本项目旨在研究基于航拍调查的海豚自动化检测与跟踪技术。项目的详细说明、研究细节与最终结果已发表于论文(论文链接)。
每套数据集对应项目的不同阶段,分别打包在两个压缩包中:1. 检测任务数据集——Detection.zip;2. 跟踪任务数据集——Tracking.zip。关于数据集的具体内容与标注格式,详见下文。*若需查看各文件内容,可使用预览功能,本章节后续也会提供详细说明。
### 检测数据集
本数据集包含1125张航拍图像,单张图像中可包含多只海豚。本项目的检测任务基于RetinaNet(RetinaNet)算法实现——一种有监督的深度学习算法,采用Keras RetinaNet(Keras RetinaNet)的开源实现。因此,数据被划分为训练集、验证集与测试集,占比分别为70%、15%与15%。
标注格式遵循该Keras RetinaNet实现的要求格式。每只海豚作为标注对象,以边界框坐标与类别进行标注。本项目未对海豚进行物种区分,因此所有海豚对象均标注为边界框,并归类为"Dolphin(海豚)"。
检测压缩包包含以下内容:
分别对应训练集、验证集与测试集的文件夹,内含对应图像;
每个子集对应的标注CSV文件;
一个全局的类别映射CSV文件。
*标注格式详见"标注说明"章节。
检测压缩包的文件结构如下:
Detection
|——————train_set (图像文件)
|——————train_set.csv
|——————validation_set (图像文件)
|——————validation_set.csv
|——————test_set (图像文件)
|——————test_set.csv
└——————class_mapping.csv
### 跟踪数据集
本数据集包含5段时长10至30秒的短视频,均由长航拍视频裁剪而来,拍摄载体为无人机。本项目的跟踪任务采用两种评估方案:基于VIAME(VIAME)应用程序,以及使用Daniel Gordon提出的Re3:通用目标视觉跟踪实时递归回归网络(Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects)的跟踪功能,本项目采用其TensorFlow开源实现。
上述两种方案均需以视频帧序列作为输入,因此已将视频拆解为帧序列。已对第一帧进行手动标注以初始化跟踪流程,算法将据此完成后续跟踪。与检测数据集一致,单帧图像中可包含多只海豚对象。
为保证标注一致性,视频帧序列的标注规则与上述检测数据集保持一致(详见"标注说明"章节)。各视频的帧标注均独立进行。因此,跟踪压缩包包含5个以视频文件名命名的文件夹,每个文件夹内含:
视频提取帧序列目录,内含拆解后的图像帧;
对应标注CSV文件;
类别映射CSV文件;
原始MP4格式视频文件。
视频的详细说明与相关信息已收录于"Videos Description.xlsx"文件,可通过预览功能查看其内容。
跟踪压缩包的文件结构如下:
Tracking
|——————DJI_0195_trim_0015_0045
| └——————frames (图像帧)
| └——————annotations_DJI_0195_trim_0015_0045.csv
| └——————class_mapping_DJI_0195_trim_0015_0045.csv
| └——————DJI_0195_trim_0015_0045.MP4
|——————DJI_0395_trim_0010_0025
| └——————frames (图像帧)
| └——————annotations_DJI_0395_trim_0010_0025.csv
| └——————class_mapping_DJI_0395_trim_0010_0025.csv
| └——————DJI_0395_trim_0010_0025.MP4
|——————DJI_0395_trim_00140_00150
| └——————frames (图像帧)
| └——————annotations_DJI_0395_trim_00140_00150.csv
| └——————class_mapping_DJI_0395_trim_00140_00150.csv
| └——————DJI_0395_trim_00140_00150.MP4
|——————DJI_0395_trim_0055_0085
| └——————frames (图像帧)
| └——————annotations_DJI_0395_trim_0055_0085.csv
| └——————class_mapping_DJI_0395_trim_0055_0085.csv
| └——————DJI_0395_trim_0055_0085.MP4
└——————HighToLow_trim_0045_0070
└——————frames (图像帧)
└——————annotations_HighToLow_trim_0045_0070.csv
└——————class_mapping_HighToLow_trim_0045_0070.csv
└——————HighToLow_trim_0045_0070.MP4
### 标注格式
两套数据集采用相同的标注格式,具体说明如下。
两套数据集的标注格式均遵循本项目检测阶段所用的Keras RetinaNet实现的要求格式。每只海豚作为标注对象,以边界框左上角与右下角坐标及类别进行标注。单张图像或视频帧中可包含多个标注对象。
所有标注均通过Labelbox(Labelbox)应用程序完成。对于检测数据集的每个子集(训练集、验证集、测试集),以及跟踪数据集的每个视频,均对应两个CSV文件:标注CSV文件与类别映射CSV文件。
标注CSV文件的每一行代表一个图像/帧的标注(边界框),每行格式如下:
path/to/image.jpg - 图像/帧的文件路径
x1, y1 - 边界框左上角的图像坐标
x2, y2 - 边界框右下角的图像坐标
class_name - 标注对象的类别名称
即每行的完整格式为:path/to/image.jpg,x1,y1,x2,y2,class_name
以下为`train_set.csv`的示例:
. rain_set1146_20170730101_ce1_sc_GOPR3047 103.jpg,506,644,599,681,Dolphin
. rain_set1146_20170730101_ce1_sc_GOPR3047 103.jpg,394,754,466,826,Dolphin
. rain_set1147_20170730101_ce1_sc_GOPR3047 104.jpg,613,699,682,781,Dolphin
. rain_set1147_20170730101_ce1_sc_GOPR3047 104.jpg,528,354,586,443,Dolphin
. rain_set1147_20170730101_ce1_sc_GOPR3047 104.jpg,633,250,723,307,Dolphin
上述示例定义了包含2张图像的数据集:
`1146_20170730101_ce1_sc_GOPR3047 103.jpg`,包含2个归类为"Dolphin(海豚)"的对象
`1147_20170730101_ce1_sc_GOPR3047 104.jpg`,包含3个归类为"Dolphin(海豚)"的对象
类别映射CSV文件的每一行代表一组类别映射,格式为:class_name,id
示例如下:
Dolphin,0
创建时间:
2024-01-31



