five

sulagnasaharasha/bci-temporal

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sulagnasaharasha/bci-temporal
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - image-classification - image-to-image language: - en tags: - biodiversity - remote-sensing - tropical-forest - tree-species - aerial-imagery - drone - multi-temporal - crown-view - closeup - BCI - Panama pretty_name: BCI Temporal Crown Dataset size_categories: - 10K<n<100K configs: - config_name: temporal data_files: - split: train path: temporal/train-*.parquet - split: val path: temporal/val-*.parquet - split: test path: temporal/test-*.parquet - config_name: closeup data_files: - split: train path: closeup/train-*.parquet - split: val path: closeup/val-*.parquet - split: test path: closeup/test-*.parquet --- # BCI Temporal Crown Dataset A multi-temporal, multi-modal dataset of **tropical tree crowns** from Barro Colorado Island (BCI), Panama. Each tree is observed across **16 acquisition dates** spanning June 2024 – September 2025, paired with a ground-level close-up photograph. --- ## Dataset Summary | | | |---|---| | **Site** | Barro Colorado Island (BCI), Smithsonian Tropical Research Institute, Panama | | **Tree crowns** | 1,897 labeled polygons across 84 species | | **Raster dates** | 16 (monthly, June 2024 – September 2025) | | **Total temporal rows** | ~30,000 (1,897 crowns × 16 dates) | | **Crown area** | 7 – 1,212 m² (median ~160 m²) | | **Image resolution** | 512 × 512 px, RGBA (alpha = crown mask) | --- ## Configurations This dataset has two configurations that can be **joined on `polygon_id`** at load time. ### `temporal` — Crown-view tiles (one row per crown × date) Each row is a masked aerial crown tile extracted from a monthly RGB orthomosaic raster. | Column | Type | Description | |---|---|---| | `polygon_id` | int | Unique crown identifier (join key) | | `date` | string | Acquisition date `YYYYMMDD` | | `split` | string | `train` / `val` / `test` | | `species_label` | string | Species name used as the classification label | | `gbif_accepted_scientific_name` | string | GBIF-accepted full scientific name | | `final_plant_name` | string | Field-verified plant name | | `area` | float | Crown polygon area in m² | | `crownview` | Image | 512×512 RGBA masked aerial tile | **Size:** ~30,350 rows (train ~21,380 · val ~4,272 · test ~4,704 — 16 dates each) ### `closeup` — Ground-level close-up photos (one row per crown) Each row is a zoom photograph taken from a drone at lower altitude, centered on the crown. | Column | Type | Description | |---|---|---| | `polygon_id` | int | Unique crown identifier (join key) | | `split` | string | `train` / `val` / `test` | | `species_label` | string | Species name used as the classification label | | `gbif_accepted_scientific_name` | string | GBIF-accepted full scientific name | | `final_plant_name` | string | Field-verified plant name | | `area` | float | Crown polygon area in m² | | `closeup` | Image | 512×512 RGBA center-cropped/padded close-up photo | **Size:** 1,897 rows (train 1,336 · val 267 · test 294) --- ## Data Splits Splits are **stratified by species** using a 70 / 15 / 15 allocation. Species with ≤ 6 crowns use fixed small-sample allocations to ensure representation across splits where possible. | Split | Crowns | Species | |---|---|---| | train | 1,336 | 84 | | val | 267 | 65 | | test | 294 | 81 | | **Total** | **1,897** | **84** | --- ## Species Distribution (Top 10) | Species | Total crowns | |---|---| | *Anacardium excelsum* | 257 | | *Dipteryx oleifera* | 190 | | *Luehea seemannii* | 109 | | *Prioria copaifera* | 95 | | *Jacaranda copaia* | 90 | | *Hieronyma alchorneoides* | 83 | | *Virola surinamensis* | 63 | | *Hura crepitans* | 57 | | *Tachigali panamensis* | 45 | | *Quararibea stenophylla* | 44 | The dataset is **long-tailed**: 84 species total, many with < 10 crowns. --- ## Temporal Coverage 16 monthly acquisition dates spanning June 2024 – September 2025. The `date` column in the `temporal` config uses `YYYYMMDD` format. | # | Date (`YYYYMMDD`) | Calendar date | Season | |---|---|---|---| | 1 | `20240611` | 2024-06-11 | Wet | | 2 | `20240716` | 2024-07-16 | Wet | | 3 | `20240813` | 2024-08-13 | Wet | | 4 | `20240918` | 2024-09-18 | Wet | | 5 | `20241014` | 2024-10-14 | Wet | | 6 | `20241112` | 2024-11-12 | Wet | | 7 | `20241216` | 2024-12-16 | Wet | | 8 | `20250124` | 2025-01-24 | Dry | | 9 | `20250217` | 2025-02-17 | Dry | | 10 | `20250317` | 2025-03-17 | Dry | | 11 | `20250414` | 2025-04-14 | Dry | | 12 | `20250512` | 2025-05-12 | Wet | | 13 | `20250616` | 2025-06-16 | Wet | | 14 | `20250715` | 2025-07-15 | Wet | | 15 | `20250818` | 2025-08-18 | Wet | | 16 | `20250915` | 2025-09-15 | Wet | Dates span the **dry season** (January–April) and **wet season** (May–December) of the Panamanian tropics, capturing phenological variation across a full annual cycle. --- ## Image Details ### Crown-view tiles (`crownview`) - **Source**: RGB COG rasters acquired over BCI (~4 cm/px GSD) - **Processing**: Each labeled crown polygon is tilerized using [geodataset](https://github.com/hugobaudchon/geodataset). Pixels outside the crown polygon are **zeroed out** (alpha = 0 in RGBA). Images are center-cropped or zero-padded to 512 × 512. - **Format**: PNG-encoded RGBA, stored as HuggingFace `Image` feature ### Close-up photos (`closeup`) - **Source**: Drone zoom photos collected via the CanopyRS platform (`zoom_url` field) - **Processing**: Downloaded from CanopyRS, center-cropped / zero-padded to 512 × 512 RGBA - **Format**: PNG-encoded RGBA, stored as HuggingFace `Image` feature - **Temporal note**: One close-up per crown (date-invariant) — join to `temporal` on `polygon_id` --- ## Usage ### Load a single config ```python from datasets import load_dataset # Multi-temporal crown views temporal = load_dataset("sulagnasaharasha/bci-temporal", "temporal") print(temporal["train"][0]) # {'polygon_id': 12345, 'date': '20250915', 'species_label': 'Anacardium excelsum', # 'crownview': <PIL.Image ...>, ...} # Close-up photos closeup = load_dataset("sulagnasaharasha/bci-temporal", "closeup") print(closeup["train"][0]) # {'polygon_id': 12345, 'species_label': 'Anacardium excelsum', # 'closeup': <PIL.Image ...>, ...} ``` ### Join temporal + closeup for multi-modal training ```python from datasets import load_dataset import pandas as pd temporal = load_dataset("sulagnasaharasha/bci-temporal", "temporal") closeup = load_dataset("sulagnasaharasha/bci-temporal", "closeup") # Convert to pandas and join t = temporal["train"].to_pandas() c = closeup["train"].to_pandas()[["polygon_id", "closeup"]] paired = t.merge(c, on="polygon_id") # Each row now has both crownview (date-specific) and closeup (date-invariant) ``` ### PyTorch Dataset example ```python import torch from torch.utils.data import Dataset from datasets import load_dataset from torchvision import transforms class BCITemporalDataset(Dataset): def __init__(self, split: str = "train", transform=None): temporal = load_dataset("sulagnasaharasha/bci-temporal", "temporal", split=split) closeup = load_dataset("sulagnasaharasha/bci-temporal", "closeup", split=split) t_df = temporal.to_pandas() c_df = closeup.to_pandas()[["polygon_id", "closeup"]] self.df = t_df.merge(c_df, on="polygon_id").reset_index(drop=True) self.species = sorted(self.df["species_label"].unique()) self.label_map = {s: i for i, s in enumerate(self.species)} self.transform = transform or transforms.ToTensor() def __len__(self) -> int: return len(self.df) def __getitem__(self, idx: int) -> dict: row = self.df.iloc[idx] crown = self.transform(row["crownview"].convert("RGB")) # [3, H, W] closeup = self.transform(row["closeup"].convert("RGB")) # [3, H, W] label = self.label_map[row["species_label"]] return {"crownview": crown, "closeup": closeup, "label": torch.tensor(label), "date": row["date"], "polygon_id": row["polygon_id"]} ``` --- ## Source Data - **Site**: Barro Colorado Island (BCI) - **Crown polygons**: Produced by [CanopyRS](https://github.com/hugobaudchon/CanopyRS) using automated segmentation + expert annotation - **Aerial rasters**: Monthly RGB orthomosaics acquired over BCI (COG format), hosted by the CanopyRS platform - **Taxonomy**: Species names resolved against [GBIF Backbone Taxonomy](https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) and [WCVP](https://wcvp.science.kew.org/) --- ## License [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) --- ## Citation If you use this dataset, please cite: ```bibtex @misc{sulagna_saha_2026, author = { Sulagna Saha and Arthur Ouaknine and Etienne Laliberté and Carol Altimas and Evan M. Gora and Adriane Esquivel Muelbert and Ian R. McGregor and Cesar Gutierrez and Vanessa E. Rubio and David Rolnick }, title = { bci-temporal (Revision d222b07) }, year = 2026, url = { https://huggingface.co/datasets/sulagnasaharasha/bci-temporal }, doi = { 10.57967/hf/8132 }, publisher = { Hugging Face } } ```
提供机构:
sulagnasaharasha
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作