BrundageLab/synthetic_wildlife_health

Name: BrundageLab/synthetic_wildlife_health
Creator: BrundageLab
Published: 2026-03-23 13:10:13
License: 暂无描述

Hugging Face2026-03-23 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/BrundageLab/synthetic_wildlife_health

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - image-classification - visual-question-answering language: - en tags: - wildlife - camera-trap - animal-health - synthetic - mange - alopecia - body-condition - conservation size_categories: - 100<n<1K --- # Synthetic Wildlife Health: Camera Trap Imagery for Alopecia and Body Condition Screening ## Dataset Summary This dataset contains 553 synthetic camera trap images depicting alopecia (hair loss consistent with mange) and body condition deterioration in North American wildlife, along with paired visual question-answering annotations for health assessment tasks. All images are AI-generated edits of real camera trap photographs sourced from [iWildCam 2022](https://github.com/visipedia/iwildcam_comp). The generative pipeline applies controlled phenotype edits to the animal region while preserving the original background, lighting conditions, and camera-trap artifacts. Every image in this dataset passed scene-drift quality control, verifying that the background was not altered by the generative model. This dataset is released alongside the paper: > **Generating Synthetic Wildlife Health Data from Camera Trap Imagery: A Pipeline for Alopecia and Body Condition Training Data** > David Brundage PhD, University of Wisconsin–Madison, School of Veterinary Medicine --- ## Intended Use **This dataset is intended as a screening data source, not a diagnostic tool.** Images are labeled as "hair loss consistent with mange" rather than clinically diagnosed mange. Visual assessment from camera trap imagery cannot distinguish mange from lice, seasonal shedding, mechanical hair loss, or idiopathic follicular inactivity. The intended model output from a classifier trained on this data is a **suspect flag for expert review**. Appropriate uses include: - Training wildlife health screening classifiers - Sim-to-real transfer experiments for camera trap health monitoring - Benchmarking synthetic data generation pipelines for ecological applications --- ## Dataset Structure ### Data Fields | Field | Type | Description | |---|---|---| | `image` | Image | Camera trap photograph with synthetic phenotype edit applied | | `question` | string | Health assessment question about the animal | | `answer` | string | Structured health assessment answer | | `species` | string | Latin species name | | `variant` | string | Phenotype variant: `sham`, `mange_only`, `emaciated_only`, `severe_both` | | `mange_label` | string | Alopecia severity: `M0` (healthy), `M2` (moderate), `M3` (severe) | | `bcs_label` | string | Body condition score: `B0` (healthy), `B2` (moderate wasting), `B3` (severe emaciation) | | `qc_method` | string | QC path: `day_pass`, `night_pass` | | `qc_mae_raw` | float | Raw mean absolute error (background pixels) | | `qc_mae_norm` | float | Color-normalized MAE (per-channel DC offset corrected) | | `qc_ssim` | float | Structural similarity index (SSIM) of background region | ### Phenotype Variants | Variant | Mange Label | BCS Label | Description | |---|---|---|---| | `sham` | M0 | B0 | Healthy negative control | | `mange_only` | M2 | B0 | Moderate patchy hair loss with scaling and crusting | | `emaciated_only` | M0 | B2 | Visible ribs, noticeable muscle wasting | | `severe_both` | M3 | B3 | Severe extensive hair loss + pronounced skeletal landmarks | ### Species | Species | Common Name | |---|---| | *Urocyon cinereoargenteus* | Gray fox | | *Canis lupus* | Gray wolf | | *Odocoileus virginianus* | White-tailed deer | | *Procyon lotor* | Raccoon | ### Data Splits | Split | Size | |---|---| | Train | ~497 images (90%) | | Test | ~56 images (10%) | --- ## Generation Pipeline ### Base Image Curation Base images were drawn from iWildCam 2022 using MegaDetector v4 for animal bounding box detection. Sampling prioritized center-frame animal placement (weight 0.6 center, 0.3 mid-frame, 0.1 edge/corner) and maintained stratified balance across species, season, and day/night conditions. ### Phenotype Editing Synthetic variants were generated using Gemini 3.1 Flash Image. Each edit prompt supplied the model with MegaDetector bounding box coordinates, species identity, severity descriptors, and explicit instructions to edit only the animal region while preserving the background pixel-identical. Lesion placement followed species-specific biology: for canids, sarcoptic mange progression (periocular → ear margins → elbows → flanks); for hoofstock, distribution excludes dorsal midline regions inaccessible to self-grooming. ### Quality Control Every image in this dataset passed scene-drift QC using a decoupled mask-then-score approach: 1. **Sham pre-filter:** The M0/B0 sham variant is generated first; base images where the sham fails QC are excluded entirely (~16% of bases filtered this way). 2. **Mask construction:** Pixel-difference thresholding + connected component labeling identifies changed regions, constrained to overlap the MegaDetector bounding box. Images where the change mask exceeds 70% of frame area are rejected as global re-renders. 3. **Scene scoring:** Computed on Gaussian-blurred background pixels (outside the animal mask): - *Daytime:* OR-gate passes if normalized MAE ≤ 7.0 **or** SSIM ≥ 0.85 - *Nighttime/IR:* raw MAE ≤ 5.0 **Overall QC pass rate: 83%** (553 of 666 generated variants). Images failing QC are not included in this release. --- ## Pipeline Statistics | Metric | Value | |---|---| | Species represented | 4 | | Base images processed | 201 | | Total variants generated | 666 | | QC-passing variants (this dataset) | 553 (83%) | | Sham pre-filter rejection rate | 16% | | Daytime pass rate | 85% | | Nighttime pass rate | 81% | --- ## Sim-to-Real Transfer Results A classifier trained **exclusively** on this synthetic dataset and evaluated on **real** camera trap images of suspected health conditions achieved: | Model | AUROC | Balanced Accuracy | |---|---|---| | MLP (256 hidden units) + DINOv2 ViT-B/14 | **0.854** | 0.811 | | Linear probe + DINOv2 ViT-B/14 | 0.734 | — | The real test set (N=70) consisted of 45 healthy iWildCam images held out from the base image pool and 25 community-reported suspected health condition images, controlling for species identity between classes. Suspect labels are community-reported, not clinically confirmed. --- ## Limitations - **Diagnostic ambiguity:** Camera trap imagery cannot distinguish mange from lice, seasonal shedding, or idiopathic hair loss. - **Species-condition mismatch:** Confirmed mange in white-tailed deer is rare in some regions; the model may flag shedding or mechanical hair loss, requiring per-region calibration. - **Posture confounds:** Edits are applied to healthy-posture base images and cannot capture illness-related postural changes. - **Transfer evaluation scale:** The real evaluation set is small (N=70); the 0.85 AUROC reflects feasibility, not production-ready performance. --- ## Citation If you use this dataset, please cite: ```bibtex @inproceedings{brundage2026synthetic, title = {Generating Synthetic Wildlife Health Data from Camera Trap Imagery: A Pipeline for Alopecia and Body Condition Training Data}, author = {Brundage, David}, } ``` --- ## License This dataset is released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Base images are derived from iWildCam 2022; please also consult the [iWildCam dataset license](https://github.com/visipedia/iwildcam_comp) when using this data. --- ## Contact David Brundage PhD University of Wisconsin–Madison, School of Veterinary Medicine brundage2@wisc.edu

提供机构：

BrundageLab

5,000+

优质数据集

54 个

任务类型

进入经典数据集