five

songbirdini/v33da_pp

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/songbirdini/v33da_pp
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 pretty_name: V33DA++ tags: - audio - multimodal - birds - bioacoustics - localization - active-speaker-detection - source-separation task_categories: - audio-classification - audio-to-text - image-segmentation - object-detection - video-classification - other language: - en --- # V33DA++ ## Summary V33DA++ is a **companion release** to the V33DA benchmark. It is built from the same BirdPark recordings and the same 10 birds, but includes data **outside** the strict single-vocalizer benchmark clips. V33DA++ is intended for tasks that require **overlapping callers** or **longer temporal context**, such as source separation, vocal activity detection, audio-visual synchronization, or active-speaker detection. V33DA++ contains two buckets: - **overlap**: reviewer-verified events that were excluded from V33DA because another bird vocalized within the same window. Each sample includes the list of overlapping callers verified from on-body accelerometer channels. - **padded**: +/-2s context windows around each V33DA event, preserving the original event onset/offset inside a longer window. This release is **not** the benchmark used in the V33DA paper’s main results tables. ## Data Structure Top-level folders: - `overlap/`: V33DA++ overlap bucket - `overlap_padded/`: overlap bucket with +/-2s context - `padded/`: V33DA++ padded-context bucket Each bucket contains: - `v33da_pp_*.parquet`: data shards - `audio/`: multi-channel cage microphone WAVs (if exported) - `accelerometer/`: multichannel on-body vibration WAVs (if exported) - `clips/`: aligned MP4 clips (if exported) - `metadata.json`: build metadata and filters ## Configs We publish two dataset configs: - `overlap`: overlap-filtered calls with the verified list of overlapping callers - `overlap_padded`: overlap bucket with +/-2s context - `padded`: +/-2s context windows around each V33DA event ## Data Fields (Parquet) All V33DA fields are preserved, plus: - `overlap_callers`: list of overlapping caller colors (overlap bucket) - `overlap_count`: number of overlapping callers (overlap bucket) - `event_onset_sec`, `event_offset_sec` - `event_onset_frame`, `event_offset_frame` - `context_pad_sec` ## License CC-BY-4.0. External tools used during preprocessing (e.g., Whisper/WhisperSeg, SAM2, ByteTrack, YOLOX-Pose) retain their original licenses. ## Citation If you use V33DA or V33DA++, please cite the V33DA paper: ``` @inproceedings{basha2026v33da, title={Who Called? V33DA: A Multimodal Benchmark for Spatial Vocal Attribution in Social Zebra Finches}, author={Basha, Maris and others}, booktitle={NeurIPS}, year={2026} } ```

许可证:CC-BY-4.0 数据集展示名:V33DA++ 标签: - 音频(audio) - 多模态(multimodal) - 鸟类 - 生物声学(bioacoustics) - 定位 - 活跃说话人检测(active-speaker-detection) - 源分离(source-separation) 任务类别: - 音频分类 - 音频到文本 - 图像分割 - 目标检测 - 视频分类 - 其他 语言:英语(en) # V33DA++ ## 摘要 V33DA++是V33DA基准测试的配套发布版本,其数据源自同一批BirdPark录音与10种鸟类,但纳入了严格单发声源基准片段之外的样本。V33DA++适用于需要重叠发声源或更长时间上下文的任务,例如源分离(source-separation)、发声活动检测、音画同步或活跃说话人检测(active-speaker-detection)。 V33DA++包含两个数据子集: - **重叠(overlap)**:经审核人员验证的、因同一窗口内存在其他鸟类发声而被排除在V33DA之外的事件。每个样本均包含通过佩戴式加速度计通道验证的重叠发声源列表。 - **带上下文填充(padded)**:每个V33DA事件前后±2秒的上下文窗口,保留原始事件的起始/结束时间戳于更长窗口内。 本发布版本并非V33DA论文主结果表格中使用的基准测试集。 ## 数据结构 顶层文件夹: - `overlap/`:V33DA++重叠子集 - `overlap_padded/`:带±2秒上下文的重叠子集 - `padded/`:V33DA++带上下文填充子集 每个子集包含: - `v33da_pp_*.parquet`:数据分片 - `audio/`:多通道笼内麦克风录制的WAV文件(需导出后方可使用) - `accelerometer/`:多通道佩戴式振动传感器WAV文件(需导出后方可使用) - `clips/`:对齐后的MP4视频片段(需导出后方可使用) - `metadata.json`:构建元数据与过滤规则 ## 数据集配置 我们发布了三种数据集配置: - `overlap`:经重叠过滤的发声事件,附带已验证的重叠发声源列表 - `overlap_padded`:带±2秒上下文的重叠子集 - `padded`:每个V33DA事件前后±2秒的上下文窗口 ## Parquet数据字段 保留所有V33DA原有字段,新增字段如下: - `overlap_callers`:重叠发声源的颜色标识列表(仅适用于重叠子集) - `overlap_count`:重叠发声源的数量(仅适用于重叠子集) - `event_onset_sec`:事件起始时间(秒) - `event_offset_sec`:事件结束时间(秒) - `event_onset_frame`:事件起始帧 - `event_offset_frame`:事件结束帧 - `context_pad_sec`:上下文填充时长(秒) ## 许可证 采用CC-BY-4.0协议。预处理过程中使用的外部工具(如Whisper/WhisperSeg、SAM2、ByteTrack、YOLOX-Pose)保留其原始许可协议。 ## 引用 若您使用V33DA或V33DA++,请引用以下V33DA论文: @inproceedings{basha2026v33da, title={Who Called? V33DA: A Multimodal Benchmark for Spatial Vocal Attribution in Social Zebra Finches}, author={Basha, Maris and others}, booktitle={NeurIPS}, year={2026} }
提供机构:
songbirdini
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作