HakaiInstitute/mussel-gooseneck-seg-rgb-640
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/HakaiInstitute/mussel-gooseneck-seg-rgb-640
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-segmentation
language:
- en
tags:
- image
- geospatial
- biology
- aerial imagery
- remote sensing
pretty_name: MusselGooseneckSeg 640
size_categories:
- 1K<n<10K
---
# MusselGooseneckSeg: Semantic Segmentation for Rocky Intertidal Mussel and Gooseneck Barnacle Habitat
## Dataset description
MusselGooseneckSeg is a dataset for semantic segmentation of mussel and gooseneck barnacle habitat using high resolution drone imagery. It provides pixel-wise annotation for mussels and gooseneck barnacles in rocky intertidal zones.
- **Source:** Imagery collected by the Hakai Institute
## Task description
The dataset is designed for semantic segmentation of mussel and gooseneck barnacle habitat in aerial imagery. The task involves assigning each pixel in the image to one of three classes: "mussel", "gooseneck barnacle", or "background".
## Usage
### Download and iterate
Install the HuggingFace datasets library ([instructions](https://huggingface.co/docs/datasets/en/installation))
```python
from datasets import load_dataset
train_dataset = load_dataset("HakaiInstitute/mussel-gooseneck-seg-rgb-640", split="train")
val_dataset = load_dataset("HakaiInstitute/mussel-gooseneck-seg-rgb-640", split="validation")
for sample in train_dataset:
x = sample["image.tif"]
y = sample["label.tif"]
# x and y are `PIL.Image` instances, ready to feed into a training loop, PyTorch dataloader, etc.
# ...
```
### Streaming from HuggingFace
This data is released as a WebDataset, which makes it possible to use the data without downloading it in advance.
For instructions on how to do this, please see [WebDataset](https://huggingface.co/docs/hub/en/datasets-webdataset)
## Data characteristics
- **Image Format:** TIFF
- **Tile Size:** 640x640 pixels
- **Train Tile Overlap:** 50% (adjacent chips overlap by 320 pixels in both dimensions)
- **Validation Tile Overlap:** None
- **Number of Tiles:** 6,967 image and label pairs
## Annotation details
- **Method:** Manual heads-up digitizing with manual verification
- **Format:** Pixel-wise labels stored as separate mask images
- **Labelling Convention:** Each pixel assigned a single class label
## Class distribution
<!-- TODO: Fill in class distribution percentages after computing pixel label statistics -->
| Class ID | Class Name | Description | Percentage |
| :------- | :------------------- | :---------------------- | :--------: |
| 0 | Background | Unclassified areas | TODO |
| 1 | Mussels | Mussel bed | TODO |
| 2 | Gooseneck Barnacles | Gooseneck barnacle bed | TODO |
## Split information
| Split | Data Percentage | Tiles Count |
| :--------- | --------------: | ----------: |
| Train | 97% | 6,743 |
| Validation | 3% | 224 |
## Preprocessing
1. Tiles extracted from source imagery at 640x640 px
2. Training tiles extracted with 50% overlap between adjacent chips
3. Validation tiles extracted with no overlap between chips
4. Pixel-wise annotations applied for mussels and gooseneck barnacles
## Licensing information
This dataset is released under the Creative Commons Attribution 4.0 License (CC BY 4.0).
## Ethical considerations
- No identifiable individuals are present in imagery
- Minimized impact on wildlife and sensitive habitats
- Engaged with local First Nations in planning aerial surveys
## Citation information
If you use this dataset in your research, please cite:
```
@misc{denouden2026musselgoosenecseg640,
author = {Denouden, Taylor and McInnes, William and Guyn, Alex},
title = {MusselGooseneckSeg 640: Semantic Segmentation for Rocky Intertidal Mussel and Gooseneck Barnacle Habitat},
month = April,
year = 2026,
doi = { 10.57967/hf/8504 },
publisher = {Hakai Institute {\tt data@hakai.org}},
howpublished = {\url{https://huggingface.co/datasets/HakaiInstitute/mussel-gooseneck-seg-rgb-640}}
}
```
## Known limitations
- Imagery only covers areas with known mussel and gooseneck barnacle habitat
- No examples near urban or built-up environments
- Labelling errors may be present in areas with shadows, where it is difficult to distinguish organisms
- Overlapping training tiles increase the effective training set size but may introduce spatial autocorrelation between nearby chips
提供机构:
HakaiInstitute



