wetdog/TUT-urban-acoustic-scenes-2018-development

Name: wetdog/TUT-urban-acoustic-scenes-2018-development
Creator: wetdog
Published: 2023-08-19 00:08:29
License: 暂无描述

Hugging Face2023-08-19 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/wetdog/TUT-urban-acoustic-scenes-2018-development

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: scene_label dtype: string - name: identifier dtype: string - name: source_label dtype: string - name: audio dtype: audio splits: - name: train num_bytes: 24883936611.28 num_examples: 8640 download_size: 24885037396 dataset_size: 24883936611.28 configs: - config_name: default data_files: - split: train path: data/train-* license: afl-3.0 task_categories: - audio-classification size_categories: - 1K<n<10K --- # Dataset Card for "TUT-urban-acoustic-scenes-2018-development" ## Dataset Description - **Homepage: https://zenodo.org/record/1228142** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact: Toni Heittola (toni.heittola@tut.fi, http://www.cs.tut.fi/~heittolt/)** ### Dataset Summary TUT Urban Acoustic Scenes 2018 development dataset consists of 10-seconds audio segments from 10 acoustic scenes: Airport - airport Indoor shopping mall - shopping_mall Metro station - metro_station Pedestrian street - street_pedestrian Public square - public_square Street with medium level of traffic - street_traffic Travelling by a tram - tram Travelling by a bus - bus Travelling by an underground metro - metro Urban park - park Each acoustic scene has 864 segments (144 minutes of audio). The dataset contains in total 24 hours of audio. The dataset was collected in Finland by Tampere University of Technology between 02/2018 - 03/2018. The data collection has received funding from the European Research Council under the ERC Grant Agreement 637422 EVERYSOUND. ### Supported Tasks and Leaderboards - `audio-classification`: The dataset can be used to train a model for [TASK NAME], which consists in [TASK DESCRIPTION]. Success on this task is typically measured by achieving a *high/low* [metric name](https://huggingface.co/metrics/metric_name). - The ([model name](https://huggingface.co/model_name) or [model class](https://huggingface.co/transformers/model_doc/model_class.html)) model currently achieves the following score. *[IF A LEADERBOARD IS AVAILABLE]:* This task has an active leaderboard - which can be found at [leaderboard url]() and ranks models based on [metric name](https://huggingface.co/metrics/metric_name) while also reporting [other metric name](https://huggingface.co/metrics/other_metric_name). ## Dataset Structure ### Data Instances ``` { 'scene_label': 'airport', 'identifier': 'barcelona-0', 'source_label': 'a', 'audio': {'path': '/data/airport-barcelona-0-0-a.wav' 'array': array([-1.91628933e-04, -1.18494034e-04, -1.87635422e-04, ..., 4.90546227e-05, -4.98890877e-05, -4.66108322e-05]), 'sampling_rate': 48000} } ``` ### Data Fields - `scene_label`: acoustic scene label from the 10 class set, - `identifier`: city-location id 'barcelona-0', - `source_label: device id, for this dataset is always the same 'a', Filenames of the dataset have the following pattern: [scene label]-[city]-[location id]-[segment id]-[device id].wav ### Data Splits A suggested training/test partitioning of the development set is provided in order to make results reported with this dataset uniform. The partitioning is done such that the segments recorded at the same location are included into the same subset - either training or testing. The partitioning is done aiming for a 70/30 ratio between the number of segments in training and test subsets while taking into account recording locations, and selecting the closest available option. | Scene class | Train / Segments | Train / Locations | Test / Segments | Test / Locations | | ------------------ | ---------------- | ----------------- | --------------- | ---------------- | | Airport | 599 | 15 | 265 | 7 | | Bus | 622 | 26 | 242 | 10 | | Metro | 603 | 20 | 261 | 9 | | Metro station | 605 | 28 | 259 | 12 | | Park | 622 | 18 | 242 | 7 | | Public square | 648 | 18 | 216 | 6 | | Shopping mall | 585 | 16 | 279 | 6 | | Street, pedestrian | 617 | 20 | 247 | 8 | | Street, traffic | 618 | 18 | 246 | 7 | | Tram | 603 | 24 | 261 | 11 | | **Total** | **6122** | **203** | **2518** | **83** | ## Dataset Creation ### Source Data #### Initial Data Collection and Normalization The dataset was recorded in six large European cities: Barcelona, Helsinki, London, Paris, Stockholm, and Vienna. For all acoustic scenes, audio was captured in multiple locations: different streets, different parks, different shopping malls. In each location, multiple 2-3 minute long audio recordings were captured in a few slightly different positions (2-4) within the selected location. Collected audio material was cut into segments of 10 seconds length. The equipment used for recording consists of a binaural [Soundman OKM II Klassik/studio A3](http://www.soundman.de/en/products/) electret in-ear microphone and a [Zoom F8](https://www.zoom.co.jp/products/handy-recorder/zoom-f8-multitrack-field-recorder) audio recorder using 48 kHz sampling rate and 24 bit resolution. During the recording, the microphones were worn by the recording person in the ears, and head movement was kept to minimum. ### Annotations #### Annotation process Post-processing of the recorded audio involves aspects related to privacy of recorded individuals, and possible errors in the recording process. Some interferences from mobile phones are audible, but are considered part of real-world recording process. #### Who are the annotators? * Ronal Bejarano Rodriguez * Eemi Fagerlund * Aino Koskimies * Toni Heittola ### Personal and Sensitive Information The material was screened for content, and segments containing close microphone conversation were eliminated. ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators Toni Heittola (toni.heittola@tut.fi, http://www.cs.tut.fi/~heittolt/) Annamaria Mesaros (annamaria.mesaros@tut.fi, http://www.cs.tut.fi/~mesaros/) Tuomas Virtanen (tuomas.virtanen@tut.fi, http://www.cs.tut.fi/~tuomasv/) ### Licensing Information Copyright (c) 2018 Tampere University of Technology and its licensors All rights reserved. Permission is hereby granted, without written agreement and without license or royalty fees, to use and copy the TUT Urban Acoustic Scenes 2018 (“Work”) described in this document and composed of audio and metadata. This grant is only for experimental and non-commercial purposes, provided that the copyright notice in its entirety appear in all copies of this Work, and the original source of this Work, (Audio Research Group from Laboratory of Signal Processing at Tampere University of Technology), is acknowledged in any publication that reports research using this Work. Any commercial use of the Work or any part thereof is strictly prohibited. Commercial use include, but is not limited to: - selling or reproducing the Work - selling or distributing the results or content achieved by use of the Work - providing services by using the Work. IN NO EVENT SHALL TAMPERE UNIVERSITY OF TECHNOLOGY OR ITS LICENSORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE UNIVERSITY OF TECHNOLOGY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. TAMPERE UNIVERSITY OF TECHNOLOGY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE TAMPERE UNIVERSITY OF TECHNOLOGY HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. ### Citation Information [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1228142.svg)](https://doi.org/10.5281/zenodo.1228142) ### Contributions Thanks to [@github-username](https://github.com/<github-username>) for adding this dataset. [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

wetdog

原始信息汇总

数据集概述

数据集名称

TUT-urban-acoustic-scenes-2018-development

数据集描述

TUT Urban Acoustic Scenes 2018 development dataset 包含来自10种不同声学场景的10秒音频片段：

Airport
Indoor shopping mall
Metro station
Pedestrian street
Public square
Street with medium level of traffic
Travelling by a tram
Travelling by a bus
Travelling by an underground metro
Urban park

每个声学场景有864个片段（144分钟音频），总计24小时音频。该数据集由芬兰坦佩雷理工大学在2018年2月至3月期间收集，得到了欧洲研究委员会的资助。

支持的任务和排行榜

audio-classification：该数据集可用于训练音频分类模型。

数据集结构

数据实例

json { scene_label: airport, identifier: barcelona-0, source_label: a, audio: {path: /data/airport-barcelona-0-0-a.wav, array: array([-1.91628933e-04, -1.18494034e-04, -1.87635422e-04, ..., 4.90546227e-05, -4.98890877e-05, -4.66108322e-05]), sampling_rate: 48000} }

数据字段

scene_label：声学场景标签，来自10类集合。
identifier：城市-位置ID，例如 barcelona-0。
source_label：设备ID，本数据集中始终为 a。

文件名格式为：[场景标签]-[城市]-[位置ID]-[片段ID]-[设备ID].wav

数据分割

数据集提供了训练/测试分割，以确保结果的一致性。分割遵循70/30的比例，同时考虑录制位置。

场景类别	训练/片段数	训练/位置数	测试/片段数	测试/位置数
Airport	599	15	265	7
Bus	622	26	242	10
Metro	603	20	261	9
Metro station	605	28	259	12
Park	622	18	242	7
Public square	648	18	216	6
Shopping mall	585	16	279	6
Street, pedestrian	617	20	247	8
Street, traffic	618	18	246	7
Tram	603	24	261	11
总计	6122	203	2518	83