gorilla-watch/Gorilla-Zoo-Berlin

Name: gorilla-watch/Gorilla-Zoo-Berlin
Creator: gorilla-watch
Published: 2026-05-06 19:17:27
License: 暂无描述

Hugging Face2026-05-06 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/gorilla-watch/Gorilla-Zoo-Berlin

下载链接

链接失效反馈

官方服务：

资源简介：

# Gorilla-Berlin-Zoo Dataset Part of the [GorillaWatch](https://gorilla-watch.github.io) project — a cross-domain evaluation benchmark for gorilla re-identification in a controlled zoo environment. ## Dataset Description The dataset contains **154 videos** of 5 individual Western Lowland Gorillas (*Gorilla gorilla gorilla*) recorded across 3 cameras over 3 months at Berlin Zoo. It provides 188,692 annotated face bounding boxes across 275 tracklets, making it suitable for re-identification, face/body detection, and cross-domain generalization research. Unlike in-the-wild datasets such as Gorilla-SPAC-Wild, this dataset features consistent camera placement, controlled lighting, and zoo-specific domain properties (glass, artificial structures, distinct backgrounds). ### Individuals | Name | Notes | |------|-------| | Bibi | | | Tilla | | | Djambala | | | Sango | | | M'Penzi | | ### Cameras `zoo1` · `zoo2` · `zoo3` ## Configurations | Config | Description | Samples | |--------|-------------|---------| | `face_with_body` | Face crop paired with full body crop | 188,679 | | `body` | Body crop only | 401,785 | | `full_image_bbox_body` | Full video frame + body bounding box | 401,785 | | `full_image_bbox_face_with_body` | Full video frame + face and body bounding boxes | 184,626 | ### Metadata Schema All configs share these fields: | Field | Type | Description | |-------|------|-------------| | `image` | Image | Main image (face crop, body crop, or full frame depending on config) | | `class` | string | Gorilla identity | | `date` | string | Recording date (YYYY-MM-DD) | | `time` | string | Recording time (HH:MM:SS) | | `video` | string | Source video filename | | `frame_number` | int | Frame index within the video | | `camera` | string | Camera identifier | Config-specific fields: | Config | Extra fields | |--------|-------------| | `face_with_body` | `body_image` — paired body crop | | `full_image_bbox_body` | `bbox` — body bounding box [x, y, w, h] | | `full_image_bbox_face_with_body` | `bbox` — face bounding box [x, y, w, h], `body_bbox` — body bounding box [x, y, w, h] | ### Raw Videos The original 154 MP4 recordings are bundled as `videos.tar.gz` alongside the parquet shards. Extract with: ```bash tar -xzf videos.tar.gz ``` ## Usage ```python from datasets import load_dataset # Face crop paired with body crop (default) ds = load_dataset("gorilla-watch/Gorilla-Zoo-Berlin", "face_with_body") # Full frame with face + body bounding boxes ds = load_dataset("gorilla-watch/Gorilla-Zoo-Berlin", "full_image_bbox_face_with_body") ``` ## Performance Benchmarks Results from our paper on the test split: | Method | Strategy | Top-1 Accuracy | |--------|----------|----------------| | Ensemble | Confidence Averaging | **84.75%** | | Ensemble | Embedding Averaging | 80.61% | | InternVideo2 | — | 65.09% | | TimeStormer | ViT | 64.59% | | AIM | ViT | 53.56% | *Ensemble methods significantly outperform end-to-end video architectures on this benchmark.* ## Citation ```bibtex @inproceedings{GorillaWatch2026, title = {GorillaWatch: An Automated System for In-the-Wild Gorilla Re-Identification and Population Monitoring}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, author = {Maximilian Schall and Felix Leonard Knöfel and Noah Elias König and Jan Jonas Kubeler and Maximilian von Klinski and Joan Wilhelm Linnemann and Xiaoshi Liu and Iven Jelle Schlegelmilch and Ole Woyciniuk and Alexandra Schild and Dante Wasmuht and Magdalena Bermejo Espinet and German Illera Basas and Gerard de Melo}, year = {2026}, archivePrefix = {arXiv}, eprint = {2512.07776} } ``` ## License [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) ## Acknowledgments We are grateful to Zoo Berlin for their expert assistance and facility access, enabling the development of tools to support gorilla conservation.

提供机构：

gorilla-watch

搜集汇总

数据集介绍

构建方式

在灵长类动物行为监测领域，Gorilla-Zoo-Berlin数据集构建于柏林动物园的受控环境之中。研究团队利用三台固定摄像机，历时三个月对五只西部低地大猩猩进行持续拍摄，共采集了153段视频。通过专业的视频分析与标注流程，从这些视频序列中提取并标注了188,692个面部边界框，形成了275条独立的轨迹片段，每一条都关联着个体的身份、时间戳及摄像机信息，从而构建出一个结构严谨、标注密集的跨域重识别评估基准。

使用方法

该数据集主要服务于跨域大猩猩重识别系统的评估与开发。研究者可依据具体任务选择相应的数据配置：若专注于身体特征识别，可调用body_only配置；若需融合面部与身体信息，则使用face_and_body配置；而original_with_*配置则适用于需要在原始场景中定位或联合分析的研究。通过加载指定的parquet文件，用户可获得包含图像、个体类别、时间、视频源及边界框等丰富元数据的样本，进而用于模型泛化能力测试、多模态重识别算法探索、密集跟踪基准评估以及圈养环境下社会行为分析等前沿研究方向。

背景与挑战

背景概述

Gorilla-Zoo-Berlin数据集由Maximilian Schall等研究人员于2026年构建，旨在为西部低地大猩猩的个体重识别系统提供一个跨域评估基准。该数据集在柏林动物园的受控环境中采集，包含五只大猩猩在三个月内的视频数据，共计188,692个面部边界框标注，覆盖了不同光照条件与摄像机视角。其核心研究问题聚焦于如何提升模型从野外环境到圈养环境的泛化能力，通过提供配对的面部与身体裁剪区域，推动了多模态重识别与行为分析研究的发展，对野生动物保护与计算机视觉交叉领域具有重要影响力。

当前挑战

该数据集致力于解决大猩猩个体重识别这一领域问题，主要挑战在于模型需克服野外与圈养环境间的显著域偏移，例如人工结构、玻璃反射及光照变化对特征一致性的干扰。在构建过程中，研究人员面临标注大规模视频序列中动态目标的复杂性，确保在密集社交互动场景下边界框标注的精确性与连续性，同时需在受控环境中捕捉自然行为以维持生态有效性，这些因素共同构成了数据集创建的技术难点。

常用场景

经典使用场景

在计算机视觉与动物保护科学的交叉领域，Gorilla-Zoo-Berlin数据集为灵长类动物重识别研究提供了关键基准。其经典使用场景集中于跨域评估，通过对比动物园受控环境与野外自然场景下的模型表现，系统检验重识别算法的泛化能力与鲁棒性。数据集包含多视角、长时间序列的标注视频，支持对个体大猩猩在固定摄像头下的面部与身体区域进行精准检测与跟踪，为研究光照变化、视角转换及环境约束下的识别稳定性提供了标准化测试平台。

解决学术问题

该数据集有效应对了动物重识别领域若干核心学术挑战。它通过提供高质量、多模态的标注数据，解决了野外数据稀缺条件下模型训练与验证的难题，尤其针对跨环境域适应这一关键问题。数据集的设计使得研究者能够量化分析从自然栖息地到人工圈养环境的模型性能衰减，并探索利用面部与身体关联特征提升识别精度的多模态融合方法。其严谨的标注体系为行为分析、个体追踪及种群监测等生态学研究提供了可计算的基础，推动了计算机视觉技术在生物多样性保护中的实证应用。

实际应用

在实践层面，Gorilla-Zoo-Berlin数据集直接服务于野生动物保护与动物园智能化管理。基于该数据集开发的自动重识别系统，可实现对大猩猩个体的无侵入式持续监测，辅助保育人员追踪动物活动规律、评估社会行为及健康状况。此类技术能够减少人工观察的成本与误差，为种群动态分析、异常行为预警及保护策略制定提供数据支持。此外，其方法论可扩展至其他濒危物种的监测项目，提升保护生物学研究的效率与可扩展性。

数据集最近研究