songbirdini/v33da_pp
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/songbirdini/v33da_pp
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
pretty_name: V33DA++
tags:
- audio
- multimodal
- birds
- bioacoustics
- localization
- active-speaker-detection
- source-separation
task_categories:
- audio-classification
- audio-to-text
- image-segmentation
- object-detection
- video-classification
- other
language:
- en
---
# V33DA++
## Summary
V33DA++ is a **companion release** to the V33DA benchmark. It is built from the same BirdPark recordings and the same 10 birds, but includes data **outside** the strict single-vocalizer benchmark clips. V33DA++ is intended for tasks that require **overlapping callers** or **longer temporal context**, such as source separation, vocal activity detection, audio-visual synchronization, or active-speaker detection.
V33DA++ contains two buckets:
- **overlap**: reviewer-verified events that were excluded from V33DA because another bird vocalized within the same window. Each sample includes the list of overlapping callers verified from on-body accelerometer channels.
- **padded**: +/-2s context windows around each V33DA event, preserving the original event onset/offset inside a longer window.
This release is **not** the benchmark used in the V33DA paper’s main results tables.
## Data Structure
Top-level folders:
- `overlap/`: V33DA++ overlap bucket
- `overlap_padded/`: overlap bucket with +/-2s context
- `padded/`: V33DA++ padded-context bucket
Each bucket contains:
- `v33da_pp_*.parquet`: data shards
- `audio/`: multi-channel cage microphone WAVs (if exported)
- `accelerometer/`: multichannel on-body vibration WAVs (if exported)
- `clips/`: aligned MP4 clips (if exported)
- `metadata.json`: build metadata and filters
## Configs
We publish two dataset configs:
- `overlap`: overlap-filtered calls with the verified list of overlapping callers
- `overlap_padded`: overlap bucket with +/-2s context
- `padded`: +/-2s context windows around each V33DA event
## Data Fields (Parquet)
All V33DA fields are preserved, plus:
- `overlap_callers`: list of overlapping caller colors (overlap bucket)
- `overlap_count`: number of overlapping callers (overlap bucket)
- `event_onset_sec`, `event_offset_sec`
- `event_onset_frame`, `event_offset_frame`
- `context_pad_sec`
## License
CC-BY-4.0. External tools used during preprocessing (e.g., Whisper/WhisperSeg, SAM2, ByteTrack, YOLOX-Pose) retain their original licenses.
## Citation
If you use V33DA or V33DA++, please cite the V33DA paper:
```
@inproceedings{basha2026v33da,
title={Who Called? V33DA: A Multimodal Benchmark for Spatial Vocal Attribution in Social Zebra Finches},
author={Basha, Maris and others},
booktitle={NeurIPS},
year={2026}
}
```
许可证:CC-BY-4.0
数据集展示名:V33DA++
标签:
- 音频(audio)
- 多模态(multimodal)
- 鸟类
- 生物声学(bioacoustics)
- 定位
- 活跃说话人检测(active-speaker-detection)
- 源分离(source-separation)
任务类别:
- 音频分类
- 音频到文本
- 图像分割
- 目标检测
- 视频分类
- 其他
语言:英语(en)
# V33DA++
## 摘要
V33DA++是V33DA基准测试的配套发布版本,其数据源自同一批BirdPark录音与10种鸟类,但纳入了严格单发声源基准片段之外的样本。V33DA++适用于需要重叠发声源或更长时间上下文的任务,例如源分离(source-separation)、发声活动检测、音画同步或活跃说话人检测(active-speaker-detection)。
V33DA++包含两个数据子集:
- **重叠(overlap)**:经审核人员验证的、因同一窗口内存在其他鸟类发声而被排除在V33DA之外的事件。每个样本均包含通过佩戴式加速度计通道验证的重叠发声源列表。
- **带上下文填充(padded)**:每个V33DA事件前后±2秒的上下文窗口,保留原始事件的起始/结束时间戳于更长窗口内。
本发布版本并非V33DA论文主结果表格中使用的基准测试集。
## 数据结构
顶层文件夹:
- `overlap/`:V33DA++重叠子集
- `overlap_padded/`:带±2秒上下文的重叠子集
- `padded/`:V33DA++带上下文填充子集
每个子集包含:
- `v33da_pp_*.parquet`:数据分片
- `audio/`:多通道笼内麦克风录制的WAV文件(需导出后方可使用)
- `accelerometer/`:多通道佩戴式振动传感器WAV文件(需导出后方可使用)
- `clips/`:对齐后的MP4视频片段(需导出后方可使用)
- `metadata.json`:构建元数据与过滤规则
## 数据集配置
我们发布了三种数据集配置:
- `overlap`:经重叠过滤的发声事件,附带已验证的重叠发声源列表
- `overlap_padded`:带±2秒上下文的重叠子集
- `padded`:每个V33DA事件前后±2秒的上下文窗口
## Parquet数据字段
保留所有V33DA原有字段,新增字段如下:
- `overlap_callers`:重叠发声源的颜色标识列表(仅适用于重叠子集)
- `overlap_count`:重叠发声源的数量(仅适用于重叠子集)
- `event_onset_sec`:事件起始时间(秒)
- `event_offset_sec`:事件结束时间(秒)
- `event_onset_frame`:事件起始帧
- `event_offset_frame`:事件结束帧
- `context_pad_sec`:上下文填充时长(秒)
## 许可证
采用CC-BY-4.0协议。预处理过程中使用的外部工具(如Whisper/WhisperSeg、SAM2、ByteTrack、YOLOX-Pose)保留其原始许可协议。
## 引用
若您使用V33DA或V33DA++,请引用以下V33DA论文:
@inproceedings{basha2026v33da,
title={Who Called? V33DA: A Multimodal Benchmark for Spatial Vocal Attribution in Social Zebra Finches},
author={Basha, Maris and others},
booktitle={NeurIPS},
year={2026}
}
提供机构:
songbirdini



