Voxel51/RIS-LAD
收藏Hugging Face2025-12-09 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Voxel51/RIS-LAD
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators: []
language: en
size_categories:
- 1K<n<10K
task_categories:
- object-detection
task_ids: []
pretty_name: ris-lad
tags:
- fiftyone
- image
- object-detection
dataset_summary: '
This is a [FiftyOne](https://github.com/voxel51/fiftyone) dataset with 2103 samples.
## Installation
If you haven''t already, install FiftyOne:
```bash
pip install -U fiftyone
```
## Usage
```python
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
# Load the dataset
# Note: other available arguments include ''max_samples'', etc
dataset = load_from_hub("harpreetsahota/RIS-LAD")
# Launch the App
session = fo.launch_app(dataset)
```
'
---
# Dataset Card for RIS-LAD

This is a [FiftyOne](https://github.com/voxel51/fiftyone) dataset with 2103 samples.
## Installation
If you haven't already, install FiftyOne:
```bash
pip install -U fiftyone
```
## Usage
```python
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
# Load the dataset
# Note: other available arguments include 'max_samples', etc
dataset = load_from_hub("Voxel51/RIS-LAD")
# Launch the App
session = fo.launch_app(dataset)
```
## Dataset Details
### Dataset Description
**RIS-LAD** (Referring Low-Altitude Drone Image Segmentation) is the first fine-grained Referring Image Segmentation benchmark specifically designed for low-altitude drone (LAD) scenarios.
The dataset contains **13,871** meticulously annotated image-text-mask triplets collected from real-world drone footage captured at altitudes of approximately 30-100 meters with oblique viewing angles (30°-60°).
Unlike existing remote sensing RIS datasets that focus on high-altitude satellite or fixed-angle imagery, RIS-LAD addresses unique challenges of low-altitude drone perception including:
- Strong perspective changes and foreshortening from oblique views
- Tiny and densely packed objects
- Variable illumination conditions including nighttime scenes
- Category drift (tiny targets causing confusion with larger, semantically similar objects)
- Object drift (difficulty distinguishing among crowded same-class instances)
The dataset was constructed using a semi-automatic pipeline combining SAM-2 for high-quality instance masks and multimodal LLM-generated referring expressions, followed by human refinement and verification.
- **Curated by:** Kai Ye, Yingshi Luan, Zhudi Chen, Guangyue Meng, Pingyang Dai, Liujuan Cao (Xiamen University)
- **Language(s) (NLP):** English
- **License:** CC BY-NC-SA 4.0 (Creative Commons Attribution-NonCommercial-ShareAlike 4.0)
### Dataset Sources
- **Repository:** https://github.com/AHideoKuzeA/RIS-LAD-A-Benchmark-and-Model-for-Referring-Low-Altitude-Drone-Image-Segmentation
- **Paper:** [RIS-LAD: A Benchmark and Model for Referring Low-Altitude Drone Image Segmentation](https://arxiv.org/abs/2507.20920)
- **Dataset Download:** [Google Drive](https://drive.google.com/file/d/1PmtaQH_F0AUoGWgpmDSpPu27E2XSdGd4/view?usp=sharing)
## Uses
### Direct Use
This dataset is intended for:
- **Referring Image Segmentation (RIS)**: Training and evaluating models that segment objects based on natural language descriptions
- **Vision-Language Research**: Multi-modal learning combining computer vision and natural language processing
- **Low-Altitude Drone Perception**: Developing perception systems for drone applications operating at 30-100m altitude
- **Visual Grounding**: Research on grounding natural language expressions to visual regions
- **Benchmark Evaluation**: Comparing RIS methods specifically under challenging low-altitude drone conditions with tiny, dense objects and variable illumination
### Out-of-Scope Use
- **Commercial Applications**: The dataset is licensed under CC BY-NC-SA 4.0, restricting commercial use
- **High-Altitude Remote Sensing**: The dataset is specifically designed for low-altitude (30-100m) oblique views and may not generalize well to satellite or high-altitude imagery
- **Ground-Level Scene Understanding**: The oblique drone perspective differs substantially from conventional ground-view datasets
- **Privacy-Sensitive Applications**: Users should be aware that drone imagery may contain identifiable individuals or private property
## Dataset Structure
### FiftyOne Format
When converted to FiftyOne using the provided conversion script, each sample contains:
- **`filepath`**: Path to the image file
- **`tags`**: Dataset split as a tag (`train`, `val`, or `test`)
- **`prompts`**: List of all referring expression strings for that image
- **`ground_truth`**: FiftyOne Detections object containing:
- `label`: Object category name
- `bounding_box`: Normalized bounding box coordinates [x, y, width, height] in range [0, 1]
- `mask`: Binary segmentation mask (cropped to bounding box region)
- `ref_id`: Unique reference ID
- `ann_id`: Annotation ID linking to the original data
- `referring_expression`: The natural language description for this specific object
### Object Categories
The dataset includes 8 object categories commonly found in low-altitude drone imagery:
| Category | Count | Description |
|----------|-------|-------------|
| car | 4,365 | Most common category |
| people | 2,910 | Pedestrians and individuals |
| motor | 2,803 | Motorcycles and motorized two-wheelers |
| truck | 1,648 | Trucks and large vehicles |
| bus | 732 | Buses |
| bicycle | 640 | Bicycles |
| tricycle | 528 | Tricycles |
| boat | 245 | Boats and watercraft |
## Dataset Creation
### Curation Rationale
Existing referring image segmentation (RIS) datasets focus primarily on conventional ground-view scenes or high-altitude remote sensing imagery. These settings differ substantially from low-altitude drone (LAD) views where:
- Perspectives are oblique (30°-60° angles) rather than top-down or horizontal
- Objects are tiny and densely packed
- Illumination varies widely, including nighttime scenes
- Altitude is much lower (30-100m) compared to satellite imagery (>1000m)
RIS-LAD was created to bridge this gap and enable research on referring image segmentation specifically for low-altitude drone applications, which are increasingly deployed in real-world perception systems due to their flexibility and cost-effectiveness.
### Source Data
#### Data Collection and Processing
**Image Collection:**
- Source: Real-world drone footage captured at altitudes of 30-100 meters
- Viewing angles: Oblique perspectives at 30°-60° angles
- Resolution: 1080×1080 pixels
- Conditions: Various illumination including daytime and nighttime scenes
- Total images: 2,104 unique images
**Annotation Pipeline (Semi-Automatic):**
1. **Instance Segmentation**: High-quality instance masks generated using SAM-2 (Segment Anything Model 2) with prompting
2. **Referring Expression Generation**: Initial expressions generated by multimodal LLMs given:
- Cropped instance images
- Location cues
- Category information
3. **Human Refinement**: Manual verification and refinement of both masks and expressions
4. **Quality Control**: Careful verification of all 13,871 image-text-mask triplets
#### Who are the source data producers?
The source data was collected from real-world drone operations. The specific locations and operators are not disclosed in the publicly available information. The dataset was curated and annotated by researchers at Xiamen University.
### Annotations
#### Annotation process
The dataset uses a semi-automatic annotation pipeline:
1. **Segmentation Masks**: Generated using SAM-2 with human-in-the-loop prompting and verification
2. **Referring Expressions**:
- Initially generated by multimodal LLMs
- Provided with cropped object images and spatial location information
- Manually refined by human annotators
- Verified for accuracy and naturalness
The annotations include:
- Binary segmentation masks (RLE format)
- Bounding boxes
- Natural language referring expressions
- Object category labels
- Tokenized text
#### Who are the annotators?
The annotation team consisted of researchers from Xiamen University who performed the human refinement and verification steps of the semi-automatic pipeline. Specific demographic information about annotators is not provided.
#### Personal and Sensitive Information
The dataset contains drone imagery captured from low altitudes (30-100m) which may include:
- **Identifiable individuals**: People visible in public spaces
- **Vehicles**: Cars, motorcycles, trucks, buses with potentially visible license plates
- **Location information**: Urban and outdoor scenes
**Privacy Considerations:**
- Images are from real-world drone footage
- No explicit anonymization process is described
- Users should be aware of potential privacy implications
- The non-commercial license (CC BY-NC-SA 4.0) provides some restrictions on use
### Dataset-Specific Challenges
The paper identifies two key failure modes that are prevalent in this dataset:
1. **Category Drift**: Tiny targets can cause models to incorrectly segment larger, semantically similar objects
2. **Object Drift**: Dense crowds of same-class instances make it difficult to distinguish which specific instance is being referred to
### Potential Biases
- **Domain Bias**: Focused on urban/outdoor surveillance scenarios typical of drone operations
- **Category Distribution**: Heavily skewed toward vehicles (cars: 31%, motor: 20%, truck: 12%) vs. other categories
- **Illumination Bias**: While nighttime scenes are included, the distribution between day/night is not specified
- **Expression Style**: Referring expressions generated by LLMs may have stylistic patterns that differ from purely human-generated descriptions
## Citation
**BibTeX:**
```bibtex
@misc{ye2025risladbenchmarkmodelreferring,
title = {RIS-LAD: A Benchmark and Model for Referring Low-Altitude Drone Image Segmentation},
author = {Kai Ye and YingShi Luan and Zhudi Chen and Guangyue Meng and Pingyang Dai and Liujuan Cao},
year = {2025},
eprint = {2507.20920},
archivePrefix= {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2507.20920}
}
```
**APA:**
Ye, K., Luan, Y., Chen, Z., Meng, G., Dai, P., & Cao, L. (2025). RIS-LAD: A Benchmark and Model for Referring Low-Altitude Drone Image Segmentation. *arXiv preprint arXiv:2507.20920*.
提供机构:
Voxel51



