hackelle/BigEarthNetV2-LMDB
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hackelle/BigEarthNetV2-LMDB
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cdla-permissive-1.0
task_categories:
- image-classification
tags:
- remote sensing
- classification
- multi-label
- sentinel-1
- sentinel-2
- multispectral
- multimodal
- SAR
- BigEarthNet
- reBEN
- LMDB
pretty_name: reBEN (pre-converted to LMDB)
configs:
- config_name: default
data_files:
- split: all_data
path: metadata.parquet
default: true
size_categories:
- 100K<n<1M
---
[TU Berlin](https://www.tu.berlin/) | [RSiM](https://rsim.berlin/) | [DIMA](https://www.dima.tu-berlin.de/menue/database_systems_and_information_management_group/) | [BigEarth](http://www.bigearth.eu/) | [BIFOLD](https://bifold.berlin/)
:---:|:---:|:---:|:---:|:---:
<a href="https://www.tu.berlin/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/tu-berlin-logo-long-red.svg" style="font-size: 1rem; height: 2em; width: auto" alt="TU Berlin Logo"/> | <a href="https://rsim.berlin/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/RSiM_Logo_1.png" style="font-size: 1rem; height: 2em; width: auto" alt="RSiM Logo"> | <a href="https://www.dima.tu-berlin.de/menue/database_systems_and_information_management_group/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/DIMA.png" style="font-size: 1rem; height: 2em; width: auto" alt="DIMA Logo"> | <a href="http://www.bigearth.eu/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/BigEarth.png" style="font-size: 1rem; height: 2em; width: auto" alt="BigEarth Logo"> | <a href="https://bifold.berlin/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/BIFOLD_Logo_farbig.png" style="font-size: 1rem; height: 2em; width: auto; margin-right: 1em" alt="BIFOLD Logo">
---
# reBEN (pre-converted to LMDB)
> **⚠️ Unofficial mirror.** This is an **unofficial, community-provided** pre-conversion of the BigEarthNet v2.0 (reBEN) dataset into LMDB format. It is provided as a convenience for researchers who wish to get started quickly without running the full conversion pipeline. In case of any discrepancy, **the original publication and the original files always take precedence**. Please refer to the authoritative sources listed below.
---
## Overview
This dataset card describes a pre-converted [LMDB](https://lmdb.readthedocs.io/en/release/) version of **BigEarthNet v2.0** (also known as **reBEN** — *Refined BigEarthNet*), a large-scale, multi-label remote sensing benchmark dataset. The dataset was converted to LMDB format using [rico-HDL](https://github.com/kai-tub/rico-hdl), which is the recommended conversion tool for reBEN. The LMDB file stores Sentinel-1 and Sentinel-2 patches as serialized [SafeTensors](https://github.com/huggingface/safetensors) entries, keyed by patch ID.
The accompanying `metadata.parquet` file provides all patch-level metadata (labels, split assignments, geographic information, etc.) for the included patches _without seasonal snow and cloud shadows_.
These are the patches that are recommended for most settings. It is the same file that can be downloaded from the official website.
---
## Authoritative Sources
Please always consult the following primary resources:
| Resource | Link |
|:---|:---|
| BigEarthNet project page | [bigearth.net](https://bigearth.net/) |
| BigEarthNet image–text dataset (txt.bigearth.net) | [txt.bigearth.net](https://txt.bigearth.net/) |
| Original reBEN files (Zenodo) | [zenodo.org/records/10891137](https://zenodo.org/records/10891137) |
| reBEN training scripts (official repository) | [git.tu-berlin.de/rsim/reben-training-scripts](https://git.tu-berlin.de/rsim/reben-training-scripts) |
| Pretrained model weights | [BIFOLD-BigEarthNetv2-0 on Hugging Face](https://huggingface.co/BIFOLD-BigEarthNetv2-0) |
---
## Dataset Details
### LMDB Structure
Each entry in the LMDB file is a [SafeTensors](https://github.com/huggingface/safetensors)-serialized object, keyed by either the Sentinel-2 patch ID or the corresponding Sentinel-1 patch name (`s1_name`). This matches the format produced by [rico-HDL](https://github.com/kai-tub/rico-hdl).
- **Sentinel-2 entries** contain bands: `B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12`
- **Sentinel-1 entries** contain bands: `VV, VH`
### Metadata
The `metadata.parquet` file is a direct copy of the full reBEN metadata parquet. It contains all original columns including patch IDs, labels, split assignments, and geographic metadata.
---
## Usage
The recommended way to use the dataset is with [configILM](https://github.com/lhackel-tub/ConfigILM), which directly supports pytorch dataloader and lightning datamodules with optimized multi-threaded loading.
Please follow the documentation for details.
Alternatively, to load individual patches from the LMDB file, you need `lmdb` and `safetensors`:
```bash
pip install lmdb safetensors pandas pyarrow
```
```python
import lmdb
import pandas as pd
from safetensors.numpy import load as safetensor_load
lmdb_path = "path/to/BENv2.lmdb"
metadata_path = "path/to/metadata.parquet"
metadata = pd.read_parquet(metadata_path)
lmdb_env = lmdb.open(lmdb_path, map_size=1024, max_dbs=False, readonly=True)
# Load a Sentinel-2 patch
patch_id = metadata.patch_id.iloc[0]
with lmdb_env.begin(write=False) as txn:
data = txn.get(patch_id.encode())
tensor = safetensor_load(data)
# Access individual bands
r, g, b = tensor["B04"], tensor["B03"], tensor["B02"]
# Load the corresponding Sentinel-1 patch
s1_name = metadata.s1_name.iloc[0]
with lmdb_env.begin(write=False) as txn:
data = txn.get(s1_name.encode())
tensor = safetensor_load(data)
vv, vh = tensor["VV"], tensor["VH"]
```
---
## Conversion Details
The LMDB was generated from the original reBEN dataset files (downloaded from [Zenodo](https://zenodo.org/records/10891137)) using [rico-HDL](https://github.com/kai-tub/rico-hdl), which is the officially recommended conversion tool for reBEN.
---
## License
The underlying data is licensed under the **[Community Data License Agreement — Permissive, Version 1.0 (CDLA-Permissive-1.0)](https://cdla.dev/permissive-1-0/)**, consistent with the license of the original BigEarthNet v2.0 dataset. This pre-converted version inherits the same license.
---
## Citation
If you use this dataset in your research, please cite the original reBEN publication and the ConfigILM library:
```bibtex
@inproceedings{clasen2025refinedbigearthnet,
title={{reBEN}: Refined BigEarthNet Dataset for Remote Sensing Image Analysis},
author={Clasen, Kai Norman and Hackel, Leonard and Burgert, Tom and Sumbul, Gencer and Demir, Beg{\"u}m and Markl, Volker},
year={2025},
booktitle={IEEE International Geoscience and Remote Sensing Symposium (IGARSS)},
}
```
```bibtex
@article{hackel2024configilm,
title={ConfigILM: A general purpose configurable library for combining image and language models for visual question answering},
author={Hackel, Leonard and Clasen, Kai Norman and Demir, Beg{\"u}m},
journal={SoftwareX},
volume={26},
pages={101731},
year={2024},
publisher={Elsevier}
}
```
The preprint for reBEN is also available on arXiv:
> K. Clasen et al., "reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis", arXiv:2407.03653, 2024. [https://arxiv.org/abs/2407.03653](https://arxiv.org/abs/2407.03653)
提供机构:
hackelle



