nccratliri/vad-marmoset

Name: nccratliri/vad-marmoset
Creator: nccratliri
Published: 2023-10-03 07:11:10
License: 暂无描述

Hugging Face2023-10-03 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/nccratliri/vad-marmoset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- # Positive Transfer Of The Whisper Speech Transformer To Human And Animal Voice Activity Detection We proposed WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for both human and animal Voice Activity Detection (VAD). For more details, please refer to our paper > > [**Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection**](https://doi.org/10.1101/2023.09.30.560270) > > Nianlong Gu, Kanghwi Lee, Maris Basha, Sumit Kumar Ram, Guanghao You, Richard H. R. Hahnloser <br> > University of Zurich and ETH Zurich This is the Marmoset dataset customized for Animal Voice Activity Detection (vocal segmentation) in WhisperSeg. ## Download Dataset ```python from huggingface_hub import snapshot_download snapshot_download('nccratliri/vad-marmoset', local_dir = "data/marmoset", repo_type="dataset" ) ``` For more usage details, please refer to the GitHub repository: https://github.com/nianlonggu/WhisperSeg When using this dataset, please also cite: ``` @article {10.7554/eLife.68837, article_type = {journal}, title = {Fast and accurate annotation of acoustic signals with deep neural networks}, author = {Steinfath, Elsa and Palacios-Muñoz, Adrian and Rottschäfer, Julian R and Yuezak, Deniz and Clemens, Jan}, editor = {Calabrese, Ronald L and Egnor, SE Roian and Troyer, Todd}, volume = 10, year = 2021, month = {nov}, pub_date = {2021-11-01}, pages = {e68837}, citation = {eLife 2021;10:e68837}, doi = {10.7554/eLife.68837}, url = {https://doi.org/10.7554/eLife.68837}, abstract = {Acoustic signals serve communication within and across species throughout the animal kingdom. Studying the genetics, evolution, and neurobiology of acoustic communication requires annotating acoustic signals: segmenting and identifying individual acoustic elements like syllables or sound pulses. To be useful, annotations need to be accurate, robust to noise, and fast. We here introduce \textit{DeepAudioSegmenter} (\textit{DAS)}, a method that annotates acoustic signals across species based on a deep-learning derived hierarchical presentation of sound. We demonstrate the accuracy, robustness, and speed of \textit{DAS} using acoustic signals with diverse characteristics from insects, birds, and mammals. \textit{DAS} comes with a graphical user interface for annotating song, training the network, and for generating and proofreading annotations. The method can be trained to annotate signals from new species with little manual annotation and can be combined with unsupervised methods to discover novel signal types. \textit{DAS} annotates song with high throughput and low latency for experimental interventions in realtime. Overall, \textit{DAS} is a universal, versatile, and accessible tool for annotating acoustic communication signals.}, keywords = {acoustic communication, annotation, song, deep learning, bird, fly}, journal = {eLife}, issn = {2050-084X}, publisher = {eLife Sciences Publications, Ltd}, } ``` ``` @article {Gu2023.09.30.560270, author = {Nianlong Gu and Kanghwi Lee and Maris Basha and Sumit Kumar Ram and Guanghao You and Richard Hahnloser}, title = {Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection}, elocation-id = {2023.09.30.560270}, year = {2023}, doi = {10.1101/2023.09.30.560270}, publisher = {Cold Spring Harbor Laboratory}, abstract = {This paper introduces WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for human and animal Voice Activity Detection (VAD). Contrary to traditional methods that detect human voice or animal vocalizations from a short audio frame and rely on careful threshold selection, WhisperSeg processes entire spectrograms of long audio and generates plain text representations of onset, offset, and type of voice activity. Processing a longer audio context with a larger network greatly improves detection accuracy from few labeled examples. We further demonstrate a positive transfer of detection performance to new animal species, making our approach viable in the data-scarce multi-species setting.Competing Interest StatementThe authors have declared no competing interest.}, URL = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270}, eprint = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270.full.pdf}, journal = {bioRxiv} } ``` ## Contact nianlong.gu@uzh.ch

--- 许可证：Apache-2.0 --- # Whisper语音Transformer应用于人类与动物语音活动检测的正向迁移我们提出了WhisperSeg，将预训练用于自动语音识别（Automatic Speech Recognition, ASR）的Whisper Transformer应用于人类与动物的语音活动检测（Voice Activity Detection, VAD）任务。如需了解更多细节，请参阅我们的论文： > [**Whisper语音Transformer应用于人类与动物语音活动检测的正向迁移**](https://doi.org/10.1101/2023.09.30.560270) > > 顾年龙、李康辉、Maris Basha、Sumit Kumar Ram、游光浩、Richard H. R. Hahnloser <br> > 苏黎世大学与苏黎世联邦理工学院本数据集是为WhisperSeg中的动物语音活动检测（发声片段分割）任务定制的普通狨猴（Marmoset）数据集。 ## 下载数据集 python from huggingface_hub import snapshot_download snapshot_download('nccratliri/vad-marmoset', local_dir = "data/marmoset", repo_type="dataset" ) 如需了解更多使用细节，请参阅该GitHub仓库：https://github.com/nianlonggu/WhisperSeg 使用本数据集时，请同时引用以下文献： bibtex @article {10.7554/eLife.68837, article_type = {journal}, title = {基于深度学习的声学信号快速精准标注}, author = {Steinfath, Elsa and Palacios-Muñoz, Adrian and Rottschäfer, Julian R and Yuezak, Deniz and Clemens, Jan}, editor = {Calabrese, Ronald L and Egnor, SE Roian and Troyer, Todd}, volume = 10, year = 2021, month = {nov}, pub_date = {2021-11-01}, pages = {e68837}, citation = {eLife 2021;10:e68837}, doi = {10.7554/eLife.68837}, url = {https://doi.org/10.7554/eLife.68837}, abstract = {声学信号在整个动物界的物种内部与跨物种间均承担通信功能。研究声学通信的遗传学、演化与神经生物学，需要对声学信号进行标注：即分割并识别诸如音节或声脉冲等独立声学元素。合格的标注需具备准确性、抗噪性与高效性。本文介绍了深度音频分段器（DeepAudioSegmenter, DAS），一种基于深度学习导出的声音层级表征、可跨物种标注声学信号的方法。我们利用来自昆虫、鸟类与哺乳动物的多样化声学信号，验证了DAS的准确性、鲁棒性与速度。DAS配备了用于标注鸣曲、训练网络以及生成与校对标注的图形用户界面。该方法可通过少量手动标注数据，训练用于新物种信号的标注任务，还可与无监督方法结合以发现新型信号类型。DAS可对鸣曲进行高吞吐量、低延迟的标注，以支持实时实验干预。总体而言，DAS是一款通用、多功能且易用的声学通信信号标注工具。}, keywords = {acoustic communication, annotation, song, deep learning, bird, fly}, journal = {eLife}, issn = {2050-084X}, publisher = {eLife Sciences Publications, Ltd}, } bibtex @article {Gu2023.09.30.560270, author = {Nianlong Gu and Kanghwi Lee and Maris Basha and Sumit Kumar Ram and Guanghao You and Richard Hahnloser}, title = {Whisper语音Transformer应用于人类与动物语音活动检测的正向迁移}, elocation-id = {2023.09.30.560270}, year = {2023}, doi = {10.1101/2023.09.30.560270}, publisher = {Cold Spring Harbor Laboratory}, abstract = {本文介绍了WhisperSeg，将预训练用于自动语音识别（ASR）的Whisper Transformer应用于人类与动物的语音活动检测（VAD）任务。与传统方法从短音频帧检测人类语音或动物发声、依赖精细阈值选择的思路不同，WhisperSeg会处理长音频的完整语谱图，并生成语音活动的起始、结束与类型的纯文本表征。利用更大的网络处理更长的音频上下文，可从少量标注样本大幅提升检测精度。我们进一步证明了该检测性能可正向迁移至新的动物物种，使得我们的方法在数据稀缺的多物种场景中具备实用性。作者声明无竞争利益。}, URL = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270}, eprint = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270.full.pdf}, journal = {bioRxiv} } ## 联系方式 nianlong.gu@uzh.ch

提供机构：

nccratliri

原始信息汇总

数据集概述

数据集名称

Marmoset dataset

数据集用途

用于动物语音活动检测（vocal segmentation），特别是在WhisperSeg中的应用。

数据集下载

python from huggingface_hub import snapshot_download snapshot_download(nccratliri/vad-marmoset, local_dir = "data/marmoset", repo_type="dataset")

引用信息

在使用此数据集时，请引用以下文献：

@article {10.7554/eLife.68837, article_type = {journal}, title = {Fast and accurate annotation of acoustic signals with deep neural networks}, author = {Steinfath, Elsa and Palacios-Muñoz, Adrian and Rottschäfer, Julian R and Yuezak, Deniz and Clemens, Jan}, editor = {Calabrese, Ronald L and Egnor, SE Roian and Troyer, Todd}, volume = 10, year = 2021, month = {nov}, pub_date = {2021-11-01}, pages = {e68837}, citation = {eLife 2021;10:e68837}, doi = {10.7554/eLife.68837}, url = {https://doi.org/10.7554/eLife.68837}, abstract = {Acoustic signals serve communication within and across species throughout the animal kingdom. Studying the genetics, evolution, and neurobiology of acoustic communication requires annotating acoustic signals: segmenting and identifying individual acoustic elements like syllables or sound pulses. To be useful, annotations need to be accurate, robust to noise, and fast. We here introduce extit{DeepAudioSegmenter} ( extit{DAS)}, a method that annotates acoustic signals across species based on a deep-learning derived hierarchical presentation of sound. We demonstrate the accuracy, robustness, and speed of extit{DAS} using acoustic signals with diverse characteristics from insects, birds, and mammals. extit{DAS} comes with a graphical user interface for annotating song, training the network, and for generating and proofreading annotations. The method can be trained to annotate signals from new species with little manual annotation and can be combined with unsupervised methods to discover novel signal types. extit{DAS} annotates song with high throughput and low latency for experimental interventions in realtime. Overall, extit{DAS} is a universal, versatile, and accessible tool for annotating acoustic communication signals.}, keywords = {acoustic communication, annotation, song, deep learning, bird, fly}, journal = {eLife}, issn = {2050-084X}, publisher = {eLife Sciences Publications, Ltd}, }

@article {Gu2023.09.30.560270, author = {Nianlong Gu and Kanghwi Lee and Maris Basha and Sumit Kumar Ram and Guanghao You and Richard Hahnloser}, title = {Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection}, elocation-id = {2023.09.30.560270}, year = {2023}, doi = {10.1101/2023.09.30.560270}, publisher = {Cold Spring Harbor Laboratory}, abstract = {This paper introduces WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for human and animal Voice Activity Detection (VAD). Contrary to traditional methods that detect human voice or animal vocalizations from a short audio frame and rely on careful threshold selection, WhisperSeg processes entire spectrograms of long audio and generates plain text representations of onset, offset, and type of voice activity. Processing a longer audio context with a larger network greatly improves detection accuracy from few labeled examples. We further demonstrate a positive transfer of detection performance to new animal species, making our approach viable in the data-scarce multi-species setting.Competing Interest StatementThe authors have declared no competing interest.}, URL = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270}, eprint = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270.full.pdf}, journal = {bioRxiv} }

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个小型音频数据集，用于动物声音活动检测，支持WhisperSeg模型的应用。数据集包含14个样本，分为训练和测试子集，适用于跨物种声音检测的研究。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集