---
license: apache-2.0
---
# Positive Transfer Of The Whisper Speech Transformer To Human And Animal Voice Activity Detection
We proposed WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for both human and animal Voice Activity Detection (VAD). For more details, please refer to our paper
>
> [**Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection**](https://doi.org/10.1101/2023.09.30.560270)
>
> Nianlong Gu, Kanghwi Lee, Maris Basha, Sumit Kumar Ram, Guanghao You, Richard H. R. Hahnloser <br>
> University of Zurich and ETH Zurich
This animals dataset was customized Animal Voice Activity Detection (vocal segmentation) when training the WhisperSeg segmenter.
## Download Dataset
```python
from huggingface_hub import snapshot_download
snapshot_download('nccratliri/vad-animals', local_dir = "data/vad-animals", repo_type="dataset" )
```
For more usage details, please refer to the GitHub repository: https://github.com/nianlonggu/WhisperSeg
## Citation
When using this dataset for your work, please cite:
```
@article {Gu2023.09.30.560270,
author = {Nianlong Gu and Kanghwi Lee and Maris Basha and Sumit Kumar Ram and Guanghao You and Richard Hahnloser},
title = {Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection},
elocation-id = {2023.09.30.560270},
year = {2023},
doi = {10.1101/2023.09.30.560270},
publisher = {Cold Spring Harbor Laboratory},
abstract = {This paper introduces WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for human and animal Voice Activity Detection (VAD). Contrary to traditional methods that detect human voice or animal vocalizations from a short audio frame and rely on careful threshold selection, WhisperSeg processes entire spectrograms of long audio and generates plain text representations of onset, offset, and type of voice activity. Processing a longer audio context with a larger network greatly improves detection accuracy from few labeled examples. We further demonstrate a positive transfer of detection performance to new animal species, making our approach viable in the data-scarce multi-species setting.Competing Interest StatementThe authors have declared no competing interest.},
URL = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270},
eprint = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270.full.pdf},
journal = {bioRxiv}
}
```
## Contact
nianlong.gu@uzh.ch
license: apache-2.0
# 将Whisper语音Transformer成功迁移至人类与动物语音活动检测
我们提出了WhisperSeg模型,其采用预训练用于自动语音识别(Automatic Speech Recognition, ASR)的Whisper Transformer,同时实现人类与动物的语音活动检测(Voice Activity Detection, VAD)。更多细节请参阅我们的论文:
> [**《将Whisper语音Transformer成功迁移至人类与动物语音活动检测》**](https://doi.org/10.1101/2023.09.30.560270)
>
> 作者:Nianlong Gu、Kanghwi Lee、Maris Basha、Sumit Kumar Ram、Guanghao You、Richard H. R. Hahnloser <br>
> 单位:苏黎世大学与苏黎世联邦理工学院
本动物数据集专为训练WhisperSeg分割器而定制,用于动物语音活动检测(声部分割)任务。
## 下载数据集
python
from huggingface_hub import snapshot_download
snapshot_download('nccratliri/vad-animals', local_dir = "data/vad-animals", repo_type="dataset" )
更多使用细节请参阅GitHub仓库:https://github.com/nianlonggu/WhisperSeg
## 引用
若您的研究工作使用本数据集,请引用如下文献:
@article {Gu2023.09.30.560270,
author = {Nianlong Gu and Kanghwi Lee and Maris Basha and Sumit Kumar Ram and Guanghao You and Richard Hahnloser},
title = {Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection},
elocation-id = {2023.09.30.560270},
year = {2023},
doi = {10.1101/2023.09.30.560270},
publisher = {Cold Spring Harbor Laboratory},
abstract = {本文提出了WhisperSeg模型,其采用预训练用于自动语音识别(ASR)的Whisper Transformer实现人类与动物的语音活动检测(VAD)。与传统方法不同,传统方法从短音频帧中检测人类语音或动物发声,并依赖精心选取的阈值;而WhisperSeg会处理长音频的完整语谱图,并生成语音活动的起始时刻、结束时刻及活动类型的纯文本表示。通过更大规模的网络处理更长的音频上下文,仅需少量标注样本即可大幅提升检测精度。我们进一步证明,该方法的检测性能可正向迁移至新的动物物种,使得我们的方案在多物种且数据稀缺的场景中切实可行。作者声明无竞争利益冲突。},
URL = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270},
eprint = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270.full.pdf},
journal = {bioRxiv}
}
## 联系方式
nianlong.gu@uzh.ch