sulagnasaharasha/bci-temporal
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sulagnasaharasha/bci-temporal
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-classification
- image-to-image
language:
- en
tags:
- biodiversity
- remote-sensing
- tropical-forest
- tree-species
- aerial-imagery
- drone
- multi-temporal
- crown-view
- closeup
- BCI
- Panama
pretty_name: BCI Temporal Crown Dataset
size_categories:
- 10K<n<100K
configs:
- config_name: temporal
data_files:
- split: train
path: temporal/train-*.parquet
- split: val
path: temporal/val-*.parquet
- split: test
path: temporal/test-*.parquet
- config_name: closeup
data_files:
- split: train
path: closeup/train-*.parquet
- split: val
path: closeup/val-*.parquet
- split: test
path: closeup/test-*.parquet
---
# BCI Temporal Crown Dataset
A multi-temporal, multi-modal dataset of **tropical tree crowns** from Barro Colorado Island (BCI), Panama. Each tree is observed across **16 acquisition dates** spanning June 2024 – September 2025, paired with a ground-level close-up photograph.
---
## Dataset Summary
| | |
|---|---|
| **Site** | Barro Colorado Island (BCI), Smithsonian Tropical Research Institute, Panama |
| **Tree crowns** | 1,897 labeled polygons across 84 species |
| **Raster dates** | 16 (monthly, June 2024 – September 2025) |
| **Total temporal rows** | ~30,000 (1,897 crowns × 16 dates) |
| **Crown area** | 7 – 1,212 m² (median ~160 m²) |
| **Image resolution** | 512 × 512 px, RGBA (alpha = crown mask) |
---
## Configurations
This dataset has two configurations that can be **joined on `polygon_id`** at load time.
### `temporal` — Crown-view tiles (one row per crown × date)
Each row is a masked aerial crown tile extracted from a monthly RGB orthomosaic raster.
| Column | Type | Description |
|---|---|---|
| `polygon_id` | int | Unique crown identifier (join key) |
| `date` | string | Acquisition date `YYYYMMDD` |
| `split` | string | `train` / `val` / `test` |
| `species_label` | string | Species name used as the classification label |
| `gbif_accepted_scientific_name` | string | GBIF-accepted full scientific name |
| `final_plant_name` | string | Field-verified plant name |
| `area` | float | Crown polygon area in m² |
| `crownview` | Image | 512×512 RGBA masked aerial tile |
**Size:** ~30,350 rows (train ~21,380 · val ~4,272 · test ~4,704 — 16 dates each)
### `closeup` — Ground-level close-up photos (one row per crown)
Each row is a zoom photograph taken from a drone at lower altitude, centered on the crown.
| Column | Type | Description |
|---|---|---|
| `polygon_id` | int | Unique crown identifier (join key) |
| `split` | string | `train` / `val` / `test` |
| `species_label` | string | Species name used as the classification label |
| `gbif_accepted_scientific_name` | string | GBIF-accepted full scientific name |
| `final_plant_name` | string | Field-verified plant name |
| `area` | float | Crown polygon area in m² |
| `closeup` | Image | 512×512 RGBA center-cropped/padded close-up photo |
**Size:** 1,897 rows (train 1,336 · val 267 · test 294)
---
## Data Splits
Splits are **stratified by species** using a 70 / 15 / 15 allocation. Species with ≤ 6 crowns use fixed small-sample allocations to ensure representation across splits where possible.
| Split | Crowns | Species |
|---|---|---|
| train | 1,336 | 84 |
| val | 267 | 65 |
| test | 294 | 81 |
| **Total** | **1,897** | **84** |
---
## Species Distribution (Top 10)
| Species | Total crowns |
|---|---|
| *Anacardium excelsum* | 257 |
| *Dipteryx oleifera* | 190 |
| *Luehea seemannii* | 109 |
| *Prioria copaifera* | 95 |
| *Jacaranda copaia* | 90 |
| *Hieronyma alchorneoides* | 83 |
| *Virola surinamensis* | 63 |
| *Hura crepitans* | 57 |
| *Tachigali panamensis* | 45 |
| *Quararibea stenophylla* | 44 |
The dataset is **long-tailed**: 84 species total, many with < 10 crowns.
---
## Temporal Coverage
16 monthly acquisition dates spanning June 2024 – September 2025. The `date` column in the `temporal` config uses `YYYYMMDD` format.
| # | Date (`YYYYMMDD`) | Calendar date | Season |
|---|---|---|---|
| 1 | `20240611` | 2024-06-11 | Wet |
| 2 | `20240716` | 2024-07-16 | Wet |
| 3 | `20240813` | 2024-08-13 | Wet |
| 4 | `20240918` | 2024-09-18 | Wet |
| 5 | `20241014` | 2024-10-14 | Wet |
| 6 | `20241112` | 2024-11-12 | Wet |
| 7 | `20241216` | 2024-12-16 | Wet |
| 8 | `20250124` | 2025-01-24 | Dry |
| 9 | `20250217` | 2025-02-17 | Dry |
| 10 | `20250317` | 2025-03-17 | Dry |
| 11 | `20250414` | 2025-04-14 | Dry |
| 12 | `20250512` | 2025-05-12 | Wet |
| 13 | `20250616` | 2025-06-16 | Wet |
| 14 | `20250715` | 2025-07-15 | Wet |
| 15 | `20250818` | 2025-08-18 | Wet |
| 16 | `20250915` | 2025-09-15 | Wet |
Dates span the **dry season** (January–April) and **wet season** (May–December) of the Panamanian tropics, capturing phenological variation across a full annual cycle.
---
## Image Details
### Crown-view tiles (`crownview`)
- **Source**: RGB COG rasters acquired over BCI (~4 cm/px GSD)
- **Processing**: Each labeled crown polygon is tilerized using [geodataset](https://github.com/hugobaudchon/geodataset). Pixels outside the crown polygon are **zeroed out** (alpha = 0 in RGBA). Images are center-cropped or zero-padded to 512 × 512.
- **Format**: PNG-encoded RGBA, stored as HuggingFace `Image` feature
### Close-up photos (`closeup`)
- **Source**: Drone zoom photos collected via the CanopyRS platform (`zoom_url` field)
- **Processing**: Downloaded from CanopyRS, center-cropped / zero-padded to 512 × 512 RGBA
- **Format**: PNG-encoded RGBA, stored as HuggingFace `Image` feature
- **Temporal note**: One close-up per crown (date-invariant) — join to `temporal` on `polygon_id`
---
## Usage
### Load a single config
```python
from datasets import load_dataset
# Multi-temporal crown views
temporal = load_dataset("sulagnasaharasha/bci-temporal", "temporal")
print(temporal["train"][0])
# {'polygon_id': 12345, 'date': '20250915', 'species_label': 'Anacardium excelsum',
# 'crownview': <PIL.Image ...>, ...}
# Close-up photos
closeup = load_dataset("sulagnasaharasha/bci-temporal", "closeup")
print(closeup["train"][0])
# {'polygon_id': 12345, 'species_label': 'Anacardium excelsum',
# 'closeup': <PIL.Image ...>, ...}
```
### Join temporal + closeup for multi-modal training
```python
from datasets import load_dataset
import pandas as pd
temporal = load_dataset("sulagnasaharasha/bci-temporal", "temporal")
closeup = load_dataset("sulagnasaharasha/bci-temporal", "closeup")
# Convert to pandas and join
t = temporal["train"].to_pandas()
c = closeup["train"].to_pandas()[["polygon_id", "closeup"]]
paired = t.merge(c, on="polygon_id")
# Each row now has both crownview (date-specific) and closeup (date-invariant)
```
### PyTorch Dataset example
```python
import torch
from torch.utils.data import Dataset
from datasets import load_dataset
from torchvision import transforms
class BCITemporalDataset(Dataset):
def __init__(self, split: str = "train", transform=None):
temporal = load_dataset("sulagnasaharasha/bci-temporal", "temporal", split=split)
closeup = load_dataset("sulagnasaharasha/bci-temporal", "closeup", split=split)
t_df = temporal.to_pandas()
c_df = closeup.to_pandas()[["polygon_id", "closeup"]]
self.df = t_df.merge(c_df, on="polygon_id").reset_index(drop=True)
self.species = sorted(self.df["species_label"].unique())
self.label_map = {s: i for i, s in enumerate(self.species)}
self.transform = transform or transforms.ToTensor()
def __len__(self) -> int:
return len(self.df)
def __getitem__(self, idx: int) -> dict:
row = self.df.iloc[idx]
crown = self.transform(row["crownview"].convert("RGB")) # [3, H, W]
closeup = self.transform(row["closeup"].convert("RGB")) # [3, H, W]
label = self.label_map[row["species_label"]]
return {"crownview": crown, "closeup": closeup,
"label": torch.tensor(label), "date": row["date"],
"polygon_id": row["polygon_id"]}
```
---
## Source Data
- **Site**: Barro Colorado Island (BCI)
- **Crown polygons**: Produced by [CanopyRS](https://github.com/hugobaudchon/CanopyRS) using automated segmentation + expert annotation
- **Aerial rasters**: Monthly RGB orthomosaics acquired over BCI (COG format), hosted by the CanopyRS platform
- **Taxonomy**: Species names resolved against [GBIF Backbone Taxonomy](https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) and [WCVP](https://wcvp.science.kew.org/)
---
## License
[Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
---
## Citation
If you use this dataset, please cite:
```bibtex
@misc{sulagna_saha_2026,
author = { Sulagna Saha and Arthur Ouaknine and Etienne Laliberté and Carol Altimas and Evan M. Gora and Adriane Esquivel Muelbert and Ian R. McGregor and Cesar Gutierrez and Vanessa E. Rubio and David Rolnick },
title = { bci-temporal (Revision d222b07) },
year = 2026,
url = { https://huggingface.co/datasets/sulagnasaharasha/bci-temporal },
doi = { 10.57967/hf/8132 },
publisher = { Hugging Face }
}
```
提供机构:
sulagnasaharasha



