songbirdini/v33da_pp

Name: songbirdini/v33da_pp
Creator: songbirdini
Published: 2026-04-17 23:34:26
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/songbirdini/v33da_pp

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 pretty_name: V33DA++ tags: - audio - multimodal - birds - bioacoustics - localization - active-speaker-detection - source-separation task_categories: - audio-classification - audio-to-text - image-segmentation - object-detection - video-classification - other language: - en --- # V33DA++ ## Summary V33DA++ is a **companion release** to the V33DA benchmark. It is built from the same BirdPark recordings and the same 10 birds, but includes data **outside** the strict single-vocalizer benchmark clips. V33DA++ is intended for tasks that require **overlapping callers** or **longer temporal context**, such as source separation, vocal activity detection, audio-visual synchronization, or active-speaker detection. V33DA++ contains two buckets: - **overlap**: reviewer-verified events that were excluded from V33DA because another bird vocalized within the same window. Each sample includes the list of overlapping callers verified from on-body accelerometer channels. - **padded**: +/-2s context windows around each V33DA event, preserving the original event onset/offset inside a longer window. This release is **not** the benchmark used in the V33DA paper’s main results tables. ## Data Structure Top-level folders: - `overlap/`: V33DA++ overlap bucket - `overlap_padded/`: overlap bucket with +/-2s context - `padded/`: V33DA++ padded-context bucket Each bucket contains: - `v33da_pp_*.parquet`: data shards - `audio/`: multi-channel cage microphone WAVs (if exported) - `accelerometer/`: multichannel on-body vibration WAVs (if exported) - `clips/`: aligned MP4 clips (if exported) - `metadata.json`: build metadata and filters ## Configs We publish two dataset configs: - `overlap`: overlap-filtered calls with the verified list of overlapping callers - `overlap_padded`: overlap bucket with +/-2s context - `padded`: +/-2s context windows around each V33DA event ## Data Fields (Parquet) All V33DA fields are preserved, plus: - `overlap_callers`: list of overlapping caller colors (overlap bucket) - `overlap_count`: number of overlapping callers (overlap bucket) - `event_onset_sec`, `event_offset_sec` - `event_onset_frame`, `event_offset_frame` - `context_pad_sec` ## License CC-BY-4.0. External tools used during preprocessing (e.g., Whisper/WhisperSeg, SAM2, ByteTrack, YOLOX-Pose) retain their original licenses. ## Citation If you use V33DA or V33DA++, please cite the V33DA paper: ``` @inproceedings{basha2026v33da, title={Who Called? V33DA: A Multimodal Benchmark for Spatial Vocal Attribution in Social Zebra Finches}, author={Basha, Maris and others}, booktitle={NeurIPS}, year={2026} } ```

许可证：CC-BY-4.0 数据集展示名：V33DA++ 标签： - 音频（audio） - 多模态（multimodal） - 鸟类 - 生物声学（bioacoustics） - 定位 - 活跃说话人检测（active-speaker-detection） - 源分离（source-separation）任务类别： - 音频分类 - 音频到文本 - 图像分割 - 目标检测 - 视频分类 - 其他语言：英语（en） # V33DA++ ## 摘要 V33DA++是V33DA基准测试的配套发布版本，其数据源自同一批BirdPark录音与10种鸟类，但纳入了严格单发声源基准片段之外的样本。V33DA++适用于需要重叠发声源或更长时间上下文的任务，例如源分离（source-separation）、发声活动检测、音画同步或活跃说话人检测（active-speaker-detection）。 V33DA++包含两个数据子集： - **重叠（overlap）**：经审核人员验证的、因同一窗口内存在其他鸟类发声而被排除在V33DA之外的事件。每个样本均包含通过佩戴式加速度计通道验证的重叠发声源列表。 - **带上下文填充（padded）**：每个V33DA事件前后±2秒的上下文窗口，保留原始事件的起始/结束时间戳于更长窗口内。本发布版本并非V33DA论文主结果表格中使用的基准测试集。 ## 数据结构顶层文件夹： - `overlap/`：V33DA++重叠子集 - `overlap_padded/`：带±2秒上下文的重叠子集 - `padded/`：V33DA++带上下文填充子集每个子集包含： - `v33da_pp_*.parquet`：数据分片 - `audio/`：多通道笼内麦克风录制的WAV文件（需导出后方可使用） - `accelerometer/`：多通道佩戴式振动传感器WAV文件（需导出后方可使用） - `clips/`：对齐后的MP4视频片段（需导出后方可使用） - `metadata.json`：构建元数据与过滤规则 ## 数据集配置我们发布了三种数据集配置： - `overlap`：经重叠过滤的发声事件，附带已验证的重叠发声源列表 - `overlap_padded`：带±2秒上下文的重叠子集 - `padded`：每个V33DA事件前后±2秒的上下文窗口 ## Parquet数据字段保留所有V33DA原有字段，新增字段如下： - `overlap_callers`：重叠发声源的颜色标识列表（仅适用于重叠子集） - `overlap_count`：重叠发声源的数量（仅适用于重叠子集） - `event_onset_sec`：事件起始时间（秒） - `event_offset_sec`：事件结束时间（秒） - `event_onset_frame`：事件起始帧 - `event_offset_frame`：事件结束帧 - `context_pad_sec`：上下文填充时长（秒） ## 许可证采用CC-BY-4.0协议。预处理过程中使用的外部工具（如Whisper/WhisperSeg、SAM2、ByteTrack、YOLOX-Pose）保留其原始许可协议。 ## 引用若您使用V33DA或V33DA++，请引用以下V33DA论文： @inproceedings{basha2026v33da, title={Who Called? V33DA: A Multimodal Benchmark for Spatial Vocal Attribution in Social Zebra Finches}, author={Basha, Maris and others}, booktitle={NeurIPS}, year={2026} }

提供机构：

songbirdini

5,000+

优质数据集

54 个

任务类型

进入经典数据集