NMFS-OSI/noaa-pacific-benthic-cover-t1-all
收藏Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/NMFS-OSI/noaa-pacific-benthic-cover-t1-all
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: NOAA Pacific Benthic Cover Image Classification Dataset T1 (All Annotations, Unsplit)
task_categories:
- image-classification
size_categories:
- 100K<n<1M
language:
- en
tags:
- coral
- coral-reef
- marine-ecology
- underwater-imagery
- coralnet
- ncrmp
- noaa
- pacific
dataset_info:
features:
- name: image_url
dtype: string
- name: gcs_uri
dtype: string
- name: label
dtype:
class_label:
names:
'0': CCA
'1': CORAL
'2': I
'3': MA
'4': MF
'5': SC
'6': SED
'7': TURF
'8': UC
'9': TW
- name: label_name
dtype: string
- name: label_desc
dtype: string
- name: file_name
dtype: string
- name: split
dtype: string
splits:
- name: train
num_examples: 790352
---
# Dataset Card for NOAA Pacific Benthic Image Classification Dataset T1 (All Annotations, Unsplit)
## Dataset Details
### Dataset Description
This dataset contains benthic reef image crops for Tier-1 coral reef classification, built from CoralNet-derived annotations associated with NOAA Pacific Islands Fisheries Science Center (PIFSC) Ecosystem Sciences Division (ESD) monitoring programs.
Images are 224×224 crops associated with one label per crop. This **all-annotations** version is **unsplit** and includes all available T1 records found under the source path.
This is intentionally different from the curated split dataset:
- **Curated split dataset (training-ready)**: [https://huggingface.co/datasets/NMFS-OSI/noaa-pacific-benthic-cover-t1](https://huggingface.co/datasets/NMFS-OSI/noaa-pacific-benthic-cover-t1)
- This dataset (all records, unsplit): recommended for exploration, custom splitting, and QA workflows.
- **License:** Public Domain ([NOAA Open Data](https://data.noaa.gov/))
- **Shared by:** NOAA Open Data
- **Curated by:** Ecosystem Sciences Division (ESD) | Pacific Islands Fisheries Science Center (PIFSC)
- **Image Size:** 224 x 224 pixels (patch-based)
### Dataset Sources
- **Cloud Repository:** Public Google Cloud Storage source bucket path: [gs://nmfs_odp_pifsc/PIFSC/ESD/ARP/pifsc-ai-data-repository/coralnet_mirror/pacific_t1](gs://nmfs_odp_pifsc/PIFSC/ESD/ARP/pifsc-ai-data-repository/coralnet_mirror/pacific_t1)
## Dataset Structure
This dataset is provided as a single unsplit partition:
- **train (all):** 790,352
Total: **790,352** image crops.
### Features (Parquet/HF metadata)
- `image_url`: HTTPS object URL in public GCS
- `gcs_uri`: canonical `gs://` URI
- `label`: integer class index
- `label_name`: class short code
- `label_desc`: class description
- `file_name`: image file basename
- `split`: fixed to `all`
### Classes
- `0` = CCA (Coralline Alga)
- `1` = CORAL (Coral)
- `2` = I (Sessile Invertebrate)
- `3` = MA (Macroalga)
- `4` = MF (Mobile Fauna)
- `5` = SC (Soft Coral)
- `6` = SED (Sediment)
- `7` = TURF (Turf Alga)
- `8` = UC (Unclassified / Unknown)
- `9` = TW (Tape / Wand)
### Class Distribution (All Annotations)
| Class | Images |
|---|---:|
| CCA | 95,241 |
| CORAL | 153,139 |
| I | 5,181 |
| MA | 120,174 |
| MF | 1,133 |
| SC | 5,085 |
| SED | 45,871 |
| TURF | 337,415 |
| TW | 1,077 |
| UC | 26,036 |
| **Total** | **790,352** |
## Uses
### Direct Use
Suitable uses:
- Building custom train/validation/test splits
- Dataset QA and annotation auditing
- Training/evaluating broad benthic cover classifiers (Tier-1)
- Transfer learning for coral-reef benthic imagery
- Benchmarking class-imbalance handling methods
- Building marine habitat monitoring model baselines
### Out-of-Scope Use
- Species-level ecological inference (dataset is Tier-1 broad classes)
- Regulatory or policy decisions without domain expert review
- Habitat trend inference without survey design/statistical context
- Identification or any application not related to benthic habitats
## Dataset Creation
### Curation Rationale
Goal: provide a reusable pacific wide **complete** Tier-1 annotation inventory aligned with NCRMP/PIFSC benthic image workflows, enabling downstream users to apply custom filtering/splitting policies.
### Source Data(See Below)
Source records and context are drawn from NOAA InPort metadata for:
- Stratified Random Survey (StRS) benthic images and annotated benthic cover products
- Climate-station/fixed site benthic images and annotated benthic cover products
#### Data Collection and Processing
High-level workflow:
1. Photoquadrat surveys collected by divers under NCRMP methods.
2. Benthic images annotated using CPCe (historically) and CoralNet (more recent periods).
3. Tier mappings applied to convert detailed labels to Tier-1 classes.
4. Crops generated and assembled into unsplit storage paths.
5. Metadata manifests generated for HF streaming and analysis.
#### Who are the source data producers?
Primary producers are NOAA PIFSC ESD teams and collaborators conducting NCRMP field missions and annotation operations in U.S. Pacific Islands regions.
### Annotations
#### Annotation process
Annotation points were produced from benthic image analysis pipelines described in NOAA metadata. Human analysts assign benthic category labels at points, then labels are mapped into Tier levels. This dataset uses Tier-1 group labels.
#### Who are the annotators?
Trained NOAA/PIFSC-associated analysts and partner contributors using program SOPs and calibration/quality-control procedures described in NCRMP metadata.
## Bias, Risks, and Limitations
- **Geographic Bias:** Data is primarily from the Pacific Region; performance may differ in other regions.
- **Environmental Bias:** Imagery collected under certain lighting and seasonal conditions.
- Temporal variability and differences in camera conditions across years/missions
- Label uncertainty from difficult imagery and functional-group ambiguity
- Not intended for species-level precision
- Significant class imbalance (including artifact classes UC/TW)
- This dataset is unsplit; users must prevent leakage when creating train/val/test sets
### Recommendations
- Use class-balanced sampling or loss weighting
- Create site-aware splits where possible
- Keep train/val/test separation strict to reduce leakage risk
- Validate externally on independent sites/years before deployment
- Consult expert guidance for ecological interpretation
### Personal and Sensitive Information
- No personal, sensitive, or identifiable information is present.
- Images are of **marine environments** only.
---
## Glossary
- **NCRMP:** National Coral Reef Monitoring Program
- **PIFSC ESD:** Pacific Islands Fisheries Science Center, Ecosystem Sciences Division
- **StRS:** Stratified Random Survey
- **Tier-1:** Broad benthic functional classes
## Metadata / Citation
**Citation:**
Pacific Islands Fisheries Science Center (2025). Ecosystem Sciences Division (ESD);
**Related Metadata:**
- [Benthic cover from StRS annotations](https://www.fisheries.noaa.gov/inport/item/77771)
- [Benthic cover from climate-station annotations](https://www.fisheries.noaa.gov/inport/item/78600)
- [Benthic images from StRS Sites](https://www.fisheries.noaa.gov/inport/item/71814)
- [Benthic images from Fixed climate stations](https://www.fisheries.noaa.gov/inport/item/78600)
## Dataset Card Contact
For questions or inquiries, contact:
**Michael Akridge** – Michael.Akridge@noaa.gov
---
### Related Dataset
- Curated split T1 dataset (8-class, train/validation/test):
https://huggingface.co/datasets/NMFS-OSI/noaa-pacific-benthic-cover-t1
---
#### Disclaimer
This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA project content is provided on an ‘as is’ basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.
提供机构:
NMFS-OSI



