nccratliri/vad-animals

Name: nccratliri/vad-animals
Creator: nccratliri
Published: 2024-02-21 12:25:51
License: 暂无描述

Hugging Face2024-02-21 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/nccratliri/vad-animals

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- # Positive Transfer Of The Whisper Speech Transformer To Human And Animal Voice Activity Detection We proposed WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for both human and animal Voice Activity Detection (VAD). For more details, please refer to our paper > > [**Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection**](https://doi.org/10.1101/2023.09.30.560270) > > Nianlong Gu, Kanghwi Lee, Maris Basha, Sumit Kumar Ram, Guanghao You, Richard H. R. Hahnloser <br> > University of Zurich and ETH Zurich This animals dataset was customized Animal Voice Activity Detection (vocal segmentation) when training the WhisperSeg segmenter. ## Download Dataset ```python from huggingface_hub import snapshot_download snapshot_download('nccratliri/vad-animals', local_dir = "data/vad-animals", repo_type="dataset" ) ``` For more usage details, please refer to the GitHub repository: https://github.com/nianlonggu/WhisperSeg ## Citation When using this dataset for your work, please cite: ``` @article {Gu2023.09.30.560270, author = {Nianlong Gu and Kanghwi Lee and Maris Basha and Sumit Kumar Ram and Guanghao You and Richard Hahnloser}, title = {Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection}, elocation-id = {2023.09.30.560270}, year = {2023}, doi = {10.1101/2023.09.30.560270}, publisher = {Cold Spring Harbor Laboratory}, abstract = {This paper introduces WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for human and animal Voice Activity Detection (VAD). Contrary to traditional methods that detect human voice or animal vocalizations from a short audio frame and rely on careful threshold selection, WhisperSeg processes entire spectrograms of long audio and generates plain text representations of onset, offset, and type of voice activity. Processing a longer audio context with a larger network greatly improves detection accuracy from few labeled examples. We further demonstrate a positive transfer of detection performance to new animal species, making our approach viable in the data-scarce multi-species setting.Competing Interest StatementThe authors have declared no competing interest.}, URL = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270}, eprint = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270.full.pdf}, journal = {bioRxiv} } ``` ## Contact nianlong.gu@uzh.ch

license: apache-2.0 # 将Whisper语音Transformer成功迁移至人类与动物语音活动检测我们提出了WhisperSeg模型，其采用预训练用于自动语音识别（Automatic Speech Recognition, ASR）的Whisper Transformer，同时实现人类与动物的语音活动检测（Voice Activity Detection, VAD）。更多细节请参阅我们的论文： > [**《将Whisper语音Transformer成功迁移至人类与动物语音活动检测》**](https://doi.org/10.1101/2023.09.30.560270) > > 作者：Nianlong Gu、Kanghwi Lee、Maris Basha、Sumit Kumar Ram、Guanghao You、Richard H. R. Hahnloser <br> > 单位：苏黎世大学与苏黎世联邦理工学院本动物数据集专为训练WhisperSeg分割器而定制，用于动物语音活动检测（声部分割）任务。 ## 下载数据集 python from huggingface_hub import snapshot_download snapshot_download('nccratliri/vad-animals', local_dir = "data/vad-animals", repo_type="dataset" ) 更多使用细节请参阅GitHub仓库：https://github.com/nianlonggu/WhisperSeg ## 引用若您的研究工作使用本数据集，请引用如下文献： @article {Gu2023.09.30.560270, author = {Nianlong Gu and Kanghwi Lee and Maris Basha and Sumit Kumar Ram and Guanghao You and Richard Hahnloser}, title = {Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection}, elocation-id = {2023.09.30.560270}, year = {2023}, doi = {10.1101/2023.09.30.560270}, publisher = {Cold Spring Harbor Laboratory}, abstract = {本文提出了WhisperSeg模型，其采用预训练用于自动语音识别（ASR）的Whisper Transformer实现人类与动物的语音活动检测（VAD）。与传统方法不同，传统方法从短音频帧中检测人类语音或动物发声，并依赖精心选取的阈值；而WhisperSeg会处理长音频的完整语谱图，并生成语音活动的起始时刻、结束时刻及活动类型的纯文本表示。通过更大规模的网络处理更长的音频上下文，仅需少量标注样本即可大幅提升检测精度。我们进一步证明，该方法的检测性能可正向迁移至新的动物物种，使得我们的方案在多物种且数据稀缺的场景中切实可行。作者声明无竞争利益冲突。}, URL = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270}, eprint = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270.full.pdf}, journal = {bioRxiv} } ## 联系方式 nianlong.gu@uzh.ch

提供机构：

nccratliri

原始信息汇总

Positive Transfer Of The Whisper Speech Transformer To Human And Animal Voice Activity Detection

数据集概述

该数据集用于训练WhisperSeg分割器，进行动物语音活动检测（vocal segmentation）。WhisperSeg利用预训练的Whisper Transformer进行自动语音识别（ASR），以实现人类和动物的语音活动检测（VAD）。

数据集下载

python from huggingface_hub import snapshot_download snapshot_download(nccratliri/vad-animals, local_dir = "data/vad-animals", repo_type="dataset" )

引用

在使用此数据集进行研究时，请引用以下文献：

@article {Gu2023.09.30.560270, author = {Nianlong Gu and Kanghwi Lee and Maris Basha and Sumit Kumar Ram and Guanghao You and Richard Hahnloser}, title = {Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection}, elocation-id = {2023.09.30.560270}, year = {2023}, doi = {10.1101/2023.09.30.560270}, publisher = {Cold Spring Harbor Laboratory}, abstract = {This paper introduces WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for human and animal Voice Activity Detection (VAD). Contrary to traditional methods that detect human voice or animal vocalizations from a short audio frame and rely on careful threshold selection, WhisperSeg processes entire spectrograms of long audio and generates plain text representations of onset, offset, and type of voice activity. Processing a longer audio context with a larger network greatly improves detection accuracy from few labeled examples. We further demonstrate a positive transfer of detection performance to new animal species, making our approach viable in the data-scarce multi-species setting.Competing Interest StatementThe authors have declared no competing interest.}, URL = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270}, eprint = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270.full.pdf}, journal = {bioRxiv} }

搜集汇总

数据集介绍

构建方式

该数据集名为nccratliri/vad-animals，专为动物声音活动检测（Vocal Segmentation）定制，用于训练WhisperSeg分段器。该分段器采用为自动语音识别（ASR）预训练的Whisper Transformer模型，通过对长音频的完整频谱图进行处理，生成关于声音活动起始、结束及类型的基础文本表示，实现了对人类及动物声音活动的检测。

特点

nccratliri/vad-animals数据集具备显著的跨物种应用潜力。它不仅能够处理人类语音，还能检测多种动物的声音活动，特别适用于数据稀缺的多物种环境。与传统基于短音频帧和精细阈值选择的方法不同，该数据集支持长音频上下文的处理，借助大型网络显著提升检测精度，即使是在标记样本较少的情况下也能表现出色。

使用方法

使用该数据集时，用户可通过HuggingFace Hub提供的snapshot_download函数进行下载，并存储至本地目录。详细的使用方法可以在项目的GitHub仓库中找到。在使用该数据集开展相关工作时，应遵循Apache-2.0许可协议，并在成果中引用相应的论文，以尊重数据集的版权和使用规定。

背景与挑战

背景概述

在语音识别领域，人类语音活动检测技术已取得显著进展，然而对于动物语音活动的检测则鲜有涉猎。此数据集nccratliri/vad-animals的创建，旨在推进动物语音活动检测技术的发展。该数据集由瑞士苏黎世大学和ETH Zurich的Nianlong Gu等研究人员于2023年提出，用于训练WhisperSeg分割器，该分割器基于Whisper语音变压器模型，可对人类及动物语音活动进行检测。该研究及其数据集对多物种环境下数据稀缺问题提供了可行的解决方案，对语音识别领域产生了重要影响。

当前挑战

数据集构建过程中所面临的挑战主要包括：一是动物种类繁多，语音差异巨大，需克服传统方法在短音频帧处理和阈值选择上的局限性；二是如何在数据稀缺的情况下，实现模型对新型动物种类的语音活动检测性能的迁移。WhisperSeg通过处理整个音频的频谱图，并以更大网络处理更长的音频上下文，显著提高了检测准确性，并实现了检测性能的积极迁移。

常用场景

经典使用场景

在语音识别领域，动物声音活动检测（VAD）的研究逐渐受到重视。nccratliri/vad-animals数据集便是针对此研究领域的定制化数据集。其经典的使用场景在于，通过训练 WhisperSeg 分段器，对动物的声音活动进行精确的检测与分割，从而实现对人类与动物语音活动的有效识别。

解决学术问题

该数据集解决了传统VAD方法在处理动物声音时遇到的准确性不足问题。通过利用 Whisper Transformer，该模型不仅能够处理长音频的完整频谱图，还能在仅有少量标注样本的情况下显著提高检测精度。此外，其积极迁移性使得模型能够在新动物物种上展现出良好的性能，这对于数据稀缺的多物种环境尤为有意义。

衍生相关工作

基于该数据集，已经衍生出一系列相关研究工作，包括但不限于对 WhisperSeg 模型的进一步优化和改进，以及在不同动物种群的 VAD 任务中的应用探索。这些研究不仅推动了动物声音识别技术的发展，也为多模态生物信息处理提供了新的视角和方法论。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集