five

wetdog/TUT-urban-acoustic-scenes-2018-development

收藏
Hugging Face2023-08-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/wetdog/TUT-urban-acoustic-scenes-2018-development
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: scene_label dtype: string - name: identifier dtype: string - name: source_label dtype: string - name: audio dtype: audio splits: - name: train num_bytes: 24883936611.28 num_examples: 8640 download_size: 24885037396 dataset_size: 24883936611.28 configs: - config_name: default data_files: - split: train path: data/train-* license: afl-3.0 task_categories: - audio-classification size_categories: - 1K<n<10K --- # Dataset Card for "TUT-urban-acoustic-scenes-2018-development" ## Dataset Description - **Homepage: https://zenodo.org/record/1228142** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact: Toni Heittola (toni.heittola@tut.fi, http://www.cs.tut.fi/~heittolt/)** ### Dataset Summary TUT Urban Acoustic Scenes 2018 development dataset consists of 10-seconds audio segments from 10 acoustic scenes: Airport - airport Indoor shopping mall - shopping_mall Metro station - metro_station Pedestrian street - street_pedestrian Public square - public_square Street with medium level of traffic - street_traffic Travelling by a tram - tram Travelling by a bus - bus Travelling by an underground metro - metro Urban park - park Each acoustic scene has 864 segments (144 minutes of audio). The dataset contains in total 24 hours of audio. The dataset was collected in Finland by Tampere University of Technology between 02/2018 - 03/2018. The data collection has received funding from the European Research Council under the ERC Grant Agreement 637422 EVERYSOUND. ### Supported Tasks and Leaderboards - `audio-classification`: The dataset can be used to train a model for [TASK NAME], which consists in [TASK DESCRIPTION]. Success on this task is typically measured by achieving a *high/low* [metric name](https://huggingface.co/metrics/metric_name). - The ([model name](https://huggingface.co/model_name) or [model class](https://huggingface.co/transformers/model_doc/model_class.html)) model currently achieves the following score. *[IF A LEADERBOARD IS AVAILABLE]:* This task has an active leaderboard - which can be found at [leaderboard url]() and ranks models based on [metric name](https://huggingface.co/metrics/metric_name) while also reporting [other metric name](https://huggingface.co/metrics/other_metric_name). ## Dataset Structure ### Data Instances ``` { 'scene_label': 'airport', 'identifier': 'barcelona-0', 'source_label': 'a', 'audio': {'path': '/data/airport-barcelona-0-0-a.wav' 'array': array([-1.91628933e-04, -1.18494034e-04, -1.87635422e-04, ..., 4.90546227e-05, -4.98890877e-05, -4.66108322e-05]), 'sampling_rate': 48000} } ``` ### Data Fields - `scene_label`: acoustic scene label from the 10 class set, - `identifier`: city-location id 'barcelona-0', - `source_label: device id, for this dataset is always the same 'a', Filenames of the dataset have the following pattern: [scene label]-[city]-[location id]-[segment id]-[device id].wav ### Data Splits A suggested training/test partitioning of the development set is provided in order to make results reported with this dataset uniform. The partitioning is done such that the segments recorded at the same location are included into the same subset - either training or testing. The partitioning is done aiming for a 70/30 ratio between the number of segments in training and test subsets while taking into account recording locations, and selecting the closest available option. | Scene class | Train / Segments | Train / Locations | Test / Segments | Test / Locations | | ------------------ | ---------------- | ----------------- | --------------- | ---------------- | | Airport | 599 | 15 | 265 | 7 | | Bus | 622 | 26 | 242 | 10 | | Metro | 603 | 20 | 261 | 9 | | Metro station | 605 | 28 | 259 | 12 | | Park | 622 | 18 | 242 | 7 | | Public square | 648 | 18 | 216 | 6 | | Shopping mall | 585 | 16 | 279 | 6 | | Street, pedestrian | 617 | 20 | 247 | 8 | | Street, traffic | 618 | 18 | 246 | 7 | | Tram | 603 | 24 | 261 | 11 | | **Total** | **6122** | **203** | **2518** | **83** | ## Dataset Creation ### Source Data #### Initial Data Collection and Normalization The dataset was recorded in six large European cities: Barcelona, Helsinki, London, Paris, Stockholm, and Vienna. For all acoustic scenes, audio was captured in multiple locations: different streets, different parks, different shopping malls. In each location, multiple 2-3 minute long audio recordings were captured in a few slightly different positions (2-4) within the selected location. Collected audio material was cut into segments of 10 seconds length. The equipment used for recording consists of a binaural [Soundman OKM II Klassik/studio A3](http://www.soundman.de/en/products/) electret in-ear microphone and a [Zoom F8](https://www.zoom.co.jp/products/handy-recorder/zoom-f8-multitrack-field-recorder) audio recorder using 48 kHz sampling rate and 24 bit resolution. During the recording, the microphones were worn by the recording person in the ears, and head movement was kept to minimum. ### Annotations #### Annotation process Post-processing of the recorded audio involves aspects related to privacy of recorded individuals, and possible errors in the recording process. Some interferences from mobile phones are audible, but are considered part of real-world recording process. #### Who are the annotators? * Ronal Bejarano Rodriguez * Eemi Fagerlund * Aino Koskimies * Toni Heittola ### Personal and Sensitive Information The material was screened for content, and segments containing close microphone conversation were eliminated. ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators Toni Heittola (toni.heittola@tut.fi, http://www.cs.tut.fi/~heittolt/) Annamaria Mesaros (annamaria.mesaros@tut.fi, http://www.cs.tut.fi/~mesaros/) Tuomas Virtanen (tuomas.virtanen@tut.fi, http://www.cs.tut.fi/~tuomasv/) ### Licensing Information Copyright (c) 2018 Tampere University of Technology and its licensors All rights reserved. Permission is hereby granted, without written agreement and without license or royalty fees, to use and copy the TUT Urban Acoustic Scenes 2018 (“Work”) described in this document and composed of audio and metadata. This grant is only for experimental and non-commercial purposes, provided that the copyright notice in its entirety appear in all copies of this Work, and the original source of this Work, (Audio Research Group from Laboratory of Signal Processing at Tampere University of Technology), is acknowledged in any publication that reports research using this Work. Any commercial use of the Work or any part thereof is strictly prohibited. Commercial use include, but is not limited to: - selling or reproducing the Work - selling or distributing the results or content achieved by use of the Work - providing services by using the Work. IN NO EVENT SHALL TAMPERE UNIVERSITY OF TECHNOLOGY OR ITS LICENSORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE UNIVERSITY OF TECHNOLOGY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. TAMPERE UNIVERSITY OF TECHNOLOGY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE TAMPERE UNIVERSITY OF TECHNOLOGY HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. ### Citation Information [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1228142.svg)](https://doi.org/10.5281/zenodo.1228142) ### Contributions Thanks to [@github-username](https://github.com/<github-username>) for adding this dataset. [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
wetdog
原始信息汇总

数据集概述

数据集名称

TUT-urban-acoustic-scenes-2018-development

数据集描述

TUT Urban Acoustic Scenes 2018 development dataset 包含来自10种不同声学场景的10秒音频片段:

  • Airport
  • Indoor shopping mall
  • Metro station
  • Pedestrian street
  • Public square
  • Street with medium level of traffic
  • Travelling by a tram
  • Travelling by a bus
  • Travelling by an underground metro
  • Urban park

每个声学场景有864个片段(144分钟音频),总计24小时音频。该数据集由芬兰坦佩雷理工大学在2018年2月至3月期间收集,得到了欧洲研究委员会的资助。

支持的任务和排行榜

  • audio-classification:该数据集可用于训练音频分类模型。

数据集结构

数据实例

json { scene_label: airport, identifier: barcelona-0, source_label: a, audio: {path: /data/airport-barcelona-0-0-a.wav, array: array([-1.91628933e-04, -1.18494034e-04, -1.87635422e-04, ..., 4.90546227e-05, -4.98890877e-05, -4.66108322e-05]), sampling_rate: 48000} }

数据字段

  • scene_label:声学场景标签,来自10类集合。
  • identifier:城市-位置ID,例如 barcelona-0。
  • source_label:设备ID,本数据集中始终为 a。

文件名格式为:[场景标签]-[城市]-[位置ID]-[片段ID]-[设备ID].wav

数据分割

数据集提供了训练/测试分割,以确保结果的一致性。分割遵循70/30的比例,同时考虑录制位置。

场景类别 训练/片段数 训练/位置数 测试/片段数 测试/位置数
Airport 599 15 265 7
Bus 622 26 242 10
Metro 603 20 261 9
Metro station 605 28 259 12
Park 622 18 242 7
Public square 648 18 216 6
Shopping mall 585 16 279 6
Street, pedestrian 617 20 247 8
Street, traffic 618 18 246 7
Tram 603 24 261 11
总计 6122 203 2518 83

数据集创建

初始数据收集和规范化

数据集在六个欧洲大城市录制:巴塞罗那、赫尔辛基、伦敦、巴黎、斯德哥尔摩和维也纳。每个声学场景在多个位置录制,每个位置录制2-3分钟音频,然后切割成10秒片段。

录制设备包括Soundman OKM II Klassik/studio A3耳内麦克风和Zoom F8录音机,采样率为48kHz,分辨率为24位。

标注

标注过程

录制后的音频经过隐私筛选和错误校正,部分手机干扰被视为真实录制过程的一部分。

标注者

  • Ronal Bejarano Rodriguez
  • Eemi Fagerlund
  • Aino Koskimies
  • Toni Heittola

个人和敏感信息

数据集中的近距离麦克风对话片段已被移除。

许可证信息

数据集遵循afl-3.0许可证。

引用信息

DOI

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作