keshavagarwal2004dev/GeoZero_Train_Datasets
收藏Hugging Face2026-03-05 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/keshavagarwal2004dev/GeoZero_Train_Datasets
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
task_categories:
- visual-question-answering
- image-to-text
- reinforcement-learning
- feature-extraction
language:
- en
tags:
- remote-sensing
- earth-observation
- satellite-imagery
- geospatial
- geospatial-reasoning
- multimodal
- visual-question-answering
- vision-language-model
- foundation-model
size_categories:
- 100K<n<1M
---
# SFT and RL Traning dataset of GeoZero
## Dataset Composition
GeoZero consists of three variants:
| File | Description |
|------|------------|
| **GeoZero-Raw.json** | Raw aggregated data across heterogeneous datasets |
| **GeoZero-Instruct.json** | Unified instruction-tuned dataset for supervised fine-tuning |
| **GeoZero-Hard.json** | Challenging subset for RL training |
All image files are stored under the `images/` directory.
## Directory Structure
```
GeoZero_Train_Datasets/
├── images/
│ ├── AID-0000.tar
│ ├── AID-0001.tar
│ ├── RSVQA-HR-0000.tar
│ ├── ...
│
├── GeoZero-Raw.json
├── GeoZero-Instruct.json
├── GeoZero-Hard.json
└── Readme.md
```
If tar shards are used, each tar file preserves relative paths:
```
RSVQA-HR/8766.png
```
## JSON Format Examples
Each JSON file contains a list of samples in the following structure:
For GeoZero-Raw.json and GeoZero-Instruct.json:
```json
{
"messages": [
{
"role": "user",
"content": "<image>\n[vqa] Is there a residential building on the right of the university?"
},
{
"role": "assistant",
"content": "no"
}
],
"images": [
"RSVQA-HR/8766.png"
]
}
```
For GeoZero-Hard.json:
```json
{
"messages": [
{
"role": "system",
"content": "system prompt"
},
{
"role": "user",
"content": "<image>\n[vqa] What is the area covered by residential buildings? Give a response of yes or no."
}
],
"images": ["RSVQA-HR/118.png"],
"solution": "<answer> 1934m2 </answer>\n",
"task_type": ["vqa"]
}
```
## Loading the Dataset
### Load JSON Directly
```python
import json
with open("GeoZero-Instruct.json", "r", encoding="utf-8") as f:
data = json.load(f)
```
### Load with Hugging Face Datasets
```python
from datasets import load_dataset
dataset = load_dataset(
"hjvsl/GeoZero_Train_Datasets",
data_files="GeoZero-Instruct.json"
)
```
## Citation
If you use GeoZero in your research, please cite:
```bibtex
@article{wang2025geozero,
title = {GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes},
author = {Wang, Di and Liu, Shunyu and Jiang, Wentao and Wang, Fengxiang and Liu, Yi and Qin, Xiaolei and Luo, Zhiming and Zhou, Chaoyang and Guo, Haonan and Zhang, Jing and Du, Bo and Tao, Dacheng and Zhang, Liangpei},
journal = {arXiv preprint arXiv:2511.22645},
year = {2025}
}
```
## Contact
Di Wang, Wuhan University, d_wang@whu.edu.cn
提供机构:
keshavagarwal2004dev



