Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10373153
下载链接
链接失效反馈官方服务:
资源简介:
Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation
Introduction
This is a set of metadata describing a large dataset of synchronized sonar and stereo camera recordings, that were captured between August 2021 and September 2023 during the project DeeperSense (https://robotik.dfki-bremen.de/en/research/projects/deepersense/), as training data for Sonar-to-RGB image translation. Parts of the sensor data have been published (https://zenodo.org/records/7728089, https://zenodo.org/records/10220989). Due to the size of the sensor data corpus, it is currently impractical to make the entire corpus accessible online. Instead, this metadatabase serves as a relatively compact representation, allowing interested researchers to inspect the data, and select relevant portions for their particular use case, which will be made available on demand. This is an effort to comply with the FAIR principle A2 (https://www.go-fair.org/fair-principles/) that metadata shall be accessible, even when the base data is not immediately.
Locations and sensors
The sensor data was captured at four different locations, including one laboratory (Maritime Exploration Hall at DFKI RIC Bremen) and three field locations (Chalk Lake Hemmoor, Tank Wash Basin Neu-Ulm, Lake Starnberg). At all locations, a ZED camera and a Blueprint Oculus M1200d sonar were used. Additionally, a SeaVision camera was used at the Maritime Exploration Hall at DFKI RIC Bremen and at the Chalk Lake Hemmoor. The examples/ directory holds a typical output image for each sensor at each available location.
Data volume per session
Six data collection sessions were conducted. The table below presents an overview of the amount of data captured in each session:
Session dates
Location
Number of datasets
Total duration of datasets [h]
Total logfile size [GB]
Number of images
Total image size [GB]
2021-08-09 - 2021-08-12
Maritime Exploration Hall at DFKI RIC Bremen
52
10.8
28.8
389’047
88.1
2022-02-07 - 2022-02-08
Maritime Exploration Hall at DFKI RIC Bremen
35
4.4
54.1
629’626
62.3
2022-04-26 - 2022-04-28
Chalk Lake Hemmoor
52
8.1
133.6
1’114’281
97.8
2022-06-28 - 2022-06-29
Tank Wash Basin Neu-Ulm
42
6.7
144.2
824’969
26.9
2023-04-26 - 2023-04-27
Maritime Exploration Hall at DFKI RIC Bremen
55
7.4
141.9
739’613
9.6
2023-09-01 - 2023-09-02
Lake Starnberg
19
2.9
40.1
217’385
2.3
255
40.3
542.7
3’914’921
287.0
Data and metadata structure
Sensor data corpus
The sensor data corpus comprises two processing stages:
raw data streams stored in ROS bagfiles (aka logfiles),
camera and sonar images (aka datafiles) extracted from the logfiles.
The files are stored in a file tree hierarchy which groups them by session, dataset, and modality:
${session_key}/
${dataset_key}/
${logfile_name}
${modality_key}/
${datafile_name}
A typical logfile path has this form:
2023-09_starnberg_lake/
2023-09-02-15-06_hydraulic_drill/
stereo_camera-zed-2023-09-02-15-06-07.bag
A typical datafile path has this form:
2023-09_starnberg_lake/
2023-09-02-15-06_hydraulic_drill/
zed_right/
1693660038_368077993.jpg
All directory and file names, and their particles, are designed to serve as identifiers in the metadatabase. Their formatting, as well as the definitions of all terms, are documented in the file entities.json.
Metadatabase
The metadatabase is provided in two equivalent forms:
as a standalone SQLite (https://www.sqlite.org/index.html) database file metadata.sqlite for users familiar with SQLite,
as a collection of CSV files in the csv/ directory for users who prefer other tools.
The database file has been generated from the CSV files, so each database table holds the same information as the corresponding CSV file. In addition, the metadatabase contains a series of convenience views that facilitate access to certain aggregate information.
An entity relationship diagram of the metadatabase tables is stored in the file entity_relationship_diagram.png. Each entity, its attributes, and relations are documented in detail in the file entities.json
Some general design remarks:
For convenience, timestamps are always given in both a human-readable form (ISO 8601 formatted datetime strings with explicit local time zone), and as seconds since the UNIX epoch.
In practice, each logfile always contains a single stream, and each stream is stored always in a single logfile. Per database schema however, the entities stream and logfile are modeled separately, with a “many-streams-to-one-logfile” relationship. This design was chosen to be compatible with, and open for, data collections where a single logfile contains multiple streams.
A modality is not an attribute of a sensor alone, but of a datafile: Because a sensor is an attribute of a stream, and a single stream may be the source of multiple modalities (e.g. RGB vs. grayscale images from the same camera, or cartesian vs. polar projection of the same sonar output). Conversely, the same modality may originate from different sensors.
As a usage example, the data volume per session which is tabulated at the top of this document, can be extracted from the metadatabase with the following SQL query:
SELECT
PRINTF(
'%s - %s',
SUBSTR(session_start, 1, 10),
SUBSTR(session_end, 1, 10)) AS 'Session dates',
location_name_english AS Location,
number_of_datasets AS 'Number of datasets',
total_duration_of_datasets_h AS 'Total duration of datasets [h]',
total_logfile_size_gb AS 'Total logfile size [GB]',
number_of_images AS 'Number of images',
total_image_size_gb AS 'Total image size [GB]'
FROM
location
JOIN session USING (location_id)
JOIN (
SELECT
session_id,
COUNT(dataset_id) AS number_of_datasets,
ROUND(
SUM(dataset_duration) / 3600,
1) AS total_duration_of_datasets_h,
ROUND(
SUM(total_logfile_size) / 10e9,
1) AS total_logfile_size_gb
FROM
location
JOIN session USING (location_id)
JOIN dataset USING (session_id)
JOIN view__dataset_total_logfile_size USING (dataset_id)
GROUP BY
session_id
) USING (session_id)
JOIN (
SELECT
session_id,
COUNT(datafile_id) AS number_of_images,
ROUND(SUM(datafile_size) / 10e9, 1) AS total_image_size_gb
FROM
session
JOIN dataset USING (session_id)
JOIN stream USING (dataset_id)
JOIN datafile USING (stream_id)
GROUP BY
session_id
) USING (session_id)
ORDER BY session_id;
创建时间:
2024-07-08



