Unsupervised discovery of family specific vocal usage in the Mongolian gerbil
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.m905qfv68
下载链接
链接失效反馈官方服务:
资源简介:
In nature, animal vocalizations can provide crucial information about identity, including kinship and hierarchy. However, lab-based vocal behavior is typically studied during brief interactions between animals with no prior social relationship, and under environmental conditions with limited ethological relevance. Here, we address this gap by establishing long-term acoustic recordings from Mongolian gerbil families, a core social group that uses an array of sonic and ultrasonic vocalizations. Three separate gerbil families were transferred to an enlarged environment and continuous 20-day audio recordings were obtained. Using a variational autoencoder (VAE) to quantify 583,237 vocalizations, we show that gerbils exhibit a more elaborate vocal repertoire than has been previously reported and that vocal repertoire usage differs significantly by family. By performing gaussian mixture model clustering on the VAE latent space, we show that families preferentially use characteristic sets of vocal clusters and that these usage preferences remain stable over weeks. Furthermore, gerbils displayed family-specific transitions between vocal clusters. Since gerbils live naturally as extended families in complex underground burrows that are adjacent to other families, these results suggest the presence of a vocal dialect which could be exploited by animals to represent kinship. These findings position the Mongolian gerbil as a compelling animal model to study the neural basis of vocal communication and demonstrates the potential for using unsupervised machine learning with uninterrupted acoustic recordings to gain insights into naturalistic animal behavior.
Methods
Four ultrasonic microphones (Avisoft CM16/CMPA48AAF-5V) were synchronously recorded using a National Instruments multifunction data acquisition device (PCI-6143) via BNC connection with a National Instruments terminal block (BNC-2110). The recording was controlled with custom python scripts using the NI-DAQmx library (https://github.com/ni/nidaqmx-python) which wrote samples to disk at a 125 kHz sampling rate. In total, 13.084 TB of raw audio data were acquired across the three families. For further analyses, the four-channel microphone signals were averaged to create a single-channel high-fidelity audio signal.
Audio was segmented by amplitude thresholding using the AVA python package (https://github.com/pearsonlab/autoencoded-vocal-analysis). First, sound amplitude traces are calculated by computing spectrograms from raw audio, then summing each column of the spectrogram. The “get_onset_offsets” function, which performs the segmenting, requires the selection of a number of parameters which affect segmenting performance. The following values were tuned via an interactive procedure which validated that the segmenting could detect low amplitude vocalizations and capture individual vocal units apparent by eye:
seg_params = {
'min_freq': 500 # minimum frequency
'max_freq': 62500, # maximum frequency
'nperseg': 512, # FFT
'noverlap': 256, # FFT
'spec_min_val': -8, # minimum STFT log-modulus
'spec_max_val': -7.25, # maximum STFT log-modulus
'fs': 125000, # audio sample rate
'th_1': 2, # segmenting threshold 1
'th_2': 5, # segmenting threshold 2
'th_3': 2, # segmenting threshold 3
'min_dur':0.03, # minimum syllable duration (s)
'max_dur':0.3, # maximum syllable duration (s)
'smoothing_timescale': 0.007, # amplitude
'softmax': False, # apply softmax to the frequency bins to calculate amplitude
'temperature':0.5, # softmax temperature parameter
'algorithm': get_onsets_offsets
}
Sound events are detected when the amplitude exceeds 'th_3', And sound offset occurs when there is a subsequent local minimum (i.e., amplitude less than 'th_2', or 'th_1'). The maximum and minimum syllable durations were selected based on published duration ranges of gerbil vocalizations (Ter-Mikaelian et al. 2012, Kobayasi & Riquimaroux, 2012).
We computed the spectral flatness of each detected sound event using the python package librosa (https://github.com/librosa). Consistent with prior literature (Castellucci et al., 2016), we used a threshold on spectral flatness to separate putative vocal and non-vocal sounds. This threshold value was determined empirically, by calculating the false positive vocalization rate of groups of randomly sampled vocalizations. For each spectral flatness value, 100 randomly sampled vocalization spectrograms less than the working threshold value were assembled into 10x10 grids and visually inspected for false positives (e.g. non-vocal sounds). This procedure was repeated 10 times for spectral flatness thresholds of 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, and 0.4. We quantified the false positive vocalization rate for each threshold value and selected 0.3, which had a 5.5 +/- 1.96% false positive rate.
创建时间:
2024-11-20



