audio-set-16khz
收藏魔搭社区2025-12-05 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/benjamin-paine/audio-set-16khz
下载链接
链接失效反馈官方服务:
资源简介:
# Re-Upload
This repository is a re-upload of [akgphysics/AudioSet](https://huggingface.co/datasets/agkphysics/AudioSet) in Parquet format, with all audio resampled to 16 KHz using `torchaudio.transforms.Resample`.
# Author's Description
> Audio Set: An ontology and human-labeled dataset for audio events
>
> Audio event recognition, the human-like ability to identify and relate sounds from audio, is a nascent problem in machine perception. Comparable problems such as object detection in images have reaped enormous benefits from comprehensive datasets - principally ImageNet. This paper describes the creation of Audio Set, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research. Using a carefully structured hierarchical ontology of 632 audio classes guided by the literature and manual curation, we collect data from human labelers to probe the presence of specific audio classes in 10 second segments of YouTube videos. Segments are proposed for labeling using searches based on metadata, context (e.g., links), and content analysis. The result is a dataset of unprecedented breadth and size that will, we hope, substantially stimulate the development of high-performance audio event recognizers.
>
> Jort F. Gemmeke; Daniel P. W. Ellis; Dylan Freedman; Aren Jansen; Wade Lawrence; R. Channing Moore; Manoj Plakal; Marvin Ritter et al., [10.1109/ICASSP.2017.7952261](https://ieeexplore.ieee.org/document/7952261)
# License
- The AudioSet labels are published under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) (attribution.)
- The audio under the AudioSet ontology is published under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/) (attribution, share-alike.)
- Individual contribution information may be viewed at [research.google.com](https://research.google.com/audioset/dataset/index.html).
To summarize the license:
- if you use the audio from this dataset, you must share your contributions under CC-BY-SA and cite the original authors of that audio.
- if you use the labels from this dataset, you must include a citation for Google research, exampled below.
*Note: The above is not legal advice, nor does it replace the need for you to read the entirety of the licenses above before making a decision on how to use this dataset. If you're unsure whether or not your usage is allowable as per the license terms, please consult a legal professional.*
# Citation
```
@inproceedings{jort_audioset_2017,
title = {Audio Set: An ontology and human-labeled dataset for audio events},
author = {Jort F. Gemmeke and Daniel P. W. Ellis and Dylan Freedman and Aren Jansen and Wade Lawrence and R. Channing Moore and Manoj Plakal and Marvin Ritter},
year = {2017},
booktitle = {Proc. IEEE ICASSP 2017},
address = {New Orleans, LA}
}
```
# 重新上传版本
本仓库为[akgphysics/AudioSet](https://huggingface.co/datasets/agkphysics/AudioSet)的Parquet格式重新上传版本,所有音频均通过`torchaudio.transforms.Resample`重采样至16 kHz。
# 作者官方描述
> 音频集(Audio Set):面向音频事件的本体与人工标注数据集
>
> 音频事件识别,即类人地从音频中识别并关联各类声音的能力,是机器感知领域的新兴研究问题。类似的研究问题(如图像目标检测)已从大规模综合数据集(主要为ImageNet)中获得了巨大的研究进展。本论文介绍了音频集的构建过程:这是一个大规模的人工标注音频事件数据集,旨在弥补图像与音频研究间在数据可得性上的差距。研究团队采用经文献指导与人工梳理构建的632个音频类别的分层结构化本体,通过人工标注员收集数据,以探查YouTube视频10秒片段中特定音频类别的存在情况。标注片段的选取基于元数据、上下文(如链接)与内容分析的搜索结果。最终构建的数据集在覆盖范围与规模上均达到前所未有的水平,我们期望其能显著推动高性能音频事件识别器的研发。
>
> 约尔特·F·杰梅克(Jort F. Gemmeke)、丹尼尔·P·W·埃利斯(Daniel P. W. Ellis)、迪伦·弗里德曼(Dylan Freedman)、阿伦·詹森(Aren Jansen)、韦德·劳伦斯(Wade Lawrence)、R·钱宁·摩尔(R. Channing Moore)、马诺杰·普拉卡尔(Manoj Plakal)、马文·里特(Marvin Ritter)等,[10.1109/ICASSP.2017.7952261](https://ieeexplore.ieee.org/document/7952261)
# 授权协议
- 音频集的标注标签采用[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)协议发布(需注明来源)。
- 音频集本体所涵盖的音频内容采用[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)协议发布(需注明来源,且需以相同协议共享衍生作品)。
- 可在[research.google.com](https://research.google.com/audioset/dataset/index.html)查看各贡献者的相关信息。
授权协议总结如下:
- 若使用本数据集的音频内容,需以CC-BY-SA协议共享您的衍生作品,并引用该音频的原作者。
- 若使用本数据集的标注标签,需引用谷歌研究团队的相关成果,示例如下。
*注意:上述内容并非法律建议,在决定如何使用本数据集前,您仍需完整阅读上述所有授权协议。若不确定您的使用方式是否符合协议条款,请咨询专业法律人士。
# 引用格式
@inproceedings{jort_audioset_2017,
title = {Audio Set: An ontology and human-labeled dataset for audio events},
author = {Jort F. Gemmeke and Daniel P. W. Ellis and Dylan Freedman and Aren Jansen and Wade Lawrence and R. Channing Moore and Manoj Plakal and Marvin Ritter},
year = {2017},
booktitle = {Proc. IEEE ICASSP 2017},
address = {New Orleans, LA}
}
提供机构:
maas
创建时间:
2025-03-18



