Bengalese Finch song repository
收藏DataCite Commons2025-06-01 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Bengalese_Finch_song_repository/4805749/6
下载链接
链接失效反馈官方服务:
资源简介:
This is a collection of song from four Bengalese finches recorded in the Sober lab at Emory University. The song has been hand-labeled by two of the authors. <br><br><b>Set-up:</b>We have added a shell script to untar the compressed archives on Unix systems: untar_bfsongrepo.shRunning this script will produce a directory structure like:BFSongRepo/<br> bird_ID/ day_1/ day_2/<br><br><b>Usage:</b><br>To make it easy to work with the dataset, we have created a Python package, "evfuncs", available at https://github.com/soberlab/evfuncs (Please see "References" section below for a direct link).<br>How to work with the files is described on the README of that library, but we describe the types of files here briefly. The actual sound files have the extension .cbin and were created by an application that runs behavioral experiments and collects data called EvTAF. Each .cbin file has an associated .cbin.not.mat file that contains song syllable onsets, offsets, labels, etc., created by a GUI for song annotation called evsonganaly. Each .cbin file also has associated .tmp and .rec files, also created by EvTAF. Those files are not strictly required to work with this dataset but are included for completeness.<br><br>We share this collection as a means of testing different machine learning algorithms for classifying the elements of birdsong, known as syllables.<br><b>Citation:</b> please cite the DOI if you use this dataset. If you are developing machine learning algorithms, we ask that you cite our publications and software (see below) and consider benchmarking against the algorithms that we have developed. Our impression is that it will require a community of researchers working together to advance the state of the art in this area.<br><br><b>Works that use this dataset (URLs as links are below in "References"):</b>Comparison of machine learning methods applied to birdsong element classification<br>https://conference.scipy.org/proceedings/scipy2016/david_nicholson.html<br><br>Latent space visualization, characterization, and generation of diverse vocal communication signals<br>https://www.biorxiv.org/content/10.1101/870311v1.full.pdf<br><br>TweetyNet: A neural network that enables high-throughput, automated annotation of birdsong<br>https://www.biorxiv.org/content/biorxiv/early/2020/10/13/2020.08.28.272088.full.pdf<br>the paper above makes use of the following libraries:<br>https://github.com/yardencsGitHub/tweetynethttps://zenodo.org/record/4662200<br>https://github.com/NickleDave/vakhttps://zenodo.org/record/4718767<br>https://github.com/NickleDave/crowsettahttps://zenodo.org/record/4584198<br>https://github.com/NickleDave/hybrid-vocal-classifierhttps://zenodo.org/record/4678768<br><br>Fast and accurate annotation of acoustic signals with deep neural networkshttps://www.biorxiv.org/content/biorxiv/early/2021/03/29/2021.03.26.436927.full.pdf<br><br>Please feel free to contact David Nicholson (nicholdav at gmail dot com) with questions and feedback<br>
本数据集收录了埃默里大学索伯实验室(Sober lab)录制的4只白腰文鸟(Bengalese finch)的鸣曲,所有鸣曲均由两名作者人工标注。
### 设置说明:
我们提供了适用于Unix系统的压缩包解压Shell脚本:`untar_bfsongrepo.sh`。运行该脚本后将生成如下目录结构:
`BFSongRepo/
<bird_ID>/
day_1/
day_2/`
### 使用说明:
为便于本数据集的使用,我们开发了名为`evfuncs`的Python工具包,其开源地址为https://github.com/soberlab/evfuncs(直接链接详见下文"参考文献"部分)。该工具包的自述文件(README)中已详细说明数据集文件的操作方法,此处仅对文件类型做简要介绍。
原始音频文件的扩展名为`.cbin`,由用于行为实验与数据采集的软件EvTAF生成。每个`.cbin`文件均配有对应的`.cbin.not.mat`标注文件,该文件由鸣曲标注GUI工具evsonganaly生成,内含鸣唱音节(syllable)的起始时刻、终止时刻、类别标签等信息。此外,每个`.cbin`文件还配有EvTAF生成的`.tmp`与`.rec`附属文件。尽管使用本数据集时无需严格依赖此类文件,但我们仍将其一并收录以保证数据完整性。
本数据集旨在为鸟类鸣唱音节的分类任务提供测试基准,用于评估各类机器学习算法的性能。
### 引用说明:
若您使用本数据集,请引用其数字对象标识符(DOI)。若您正在开发机器学习算法,敬请引用本团队的相关论文与软件(详见下文),并建议以本团队开发的算法作为基准模型进行对比。我们认为,该领域的技术进步需要研究者群体携手共进。
### 已使用本数据集的研究成果(链接详见下文"参考文献"部分):
1. 《应用于鸟类鸣唱元素分类的机器学习方法对比》,https://conference.scipy.org/proceedings/scipy2016/david_nicholson.html
2. 《多样化声学通信信号的隐空间可视化、特征表征与生成》,https://www.biorxiv.org/content/10.1101/870311v1.full.pdf
3. 《TweetyNet:一种支持鸟类鸣曲高通量自动标注的神经网络》,https://www.biorxiv.org/content/biorxiv/early/2020/10/13/2020.08.28.272088.full.pdf
上述论文使用了以下开源工具库:
- https://github.com/yardencsGitHub/tweetynet | https://zenodo.org/record/4662200
- https://github.com/NickleDave/vak | https://zenodo.org/record/4718767
- https://github.com/NickleDave/crowsetta | https://zenodo.org/record/4584198
- https://github.com/NickleDave/hybrid-vocal-classifier | https://zenodo.org/record/4678768
4. 《基于深度神经网络的声学信号快速精准标注》,https://www.biorxiv.org/content/biorxiv/early/2021/03/29/2021.03.26.436927.full.pdf
若您有任何疑问或反馈,敬请联系戴维·尼科尔森(邮箱:nicholdav@gmail.com)。
提供机构:
figshare
创建时间:
2021-05-08



