hackelle/BigEarthNetV2-Lithuania-Summer-LMDB
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hackelle/BigEarthNetV2-Lithuania-Summer-LMDB
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cdla-permissive-1.0
task_categories:
- image-classification
tags:
- remote sensing
- classification
- multi-label
- sentinel-1
- sentinel-2
- multispectral
- multimodal
- SAR
- BigEarthNet
- reBEN
- LMDB
pretty_name: reBEN (pre-converted to LMDB) — Lithuania Summer Subset
configs:
- config_name: default
data_files:
- split: all_data
path: metadata_lithuania_summer.parquet
default: true
size_categories:
- 1K<n<10K
---
[TU Berlin](https://www.tu.berlin/) | [RSiM](https://rsim.berlin/) | [DIMA](https://www.dima.tu-berlin.de/menue/database_systems_and_information_management_group/) | [BigEarth](http://www.bigearth.eu/) | [BIFOLD](https://bifold.berlin/)
:---:|:---:|:---:|:---:|:---:
<a href="https://www.tu.berlin/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/tu-berlin-logo-long-red.svg" style="font-size: 1rem; height: 2em; width: auto" alt="TU Berlin Logo"/> | <a href="https://rsim.berlin/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/RSiM_Logo_1.png" style="font-size: 1rem; height: 2em; width: auto" alt="RSiM Logo"> | <a href="https://www.dima.tu-berlin.de/menue/database_systems_and_information_management_group/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/DIMA.png" style="font-size: 1rem; height: 2em; width: auto" alt="DIMA Logo"> | <a href="http://www.bigearth.eu/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/BigEarth.png" style="font-size: 1rem; height: 2em; width: auto" alt="BigEarth Logo"> | <a href="https://bifold.berlin/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/BIFOLD_Logo_farbig.png" style="font-size: 1rem; height: 2em; width: auto; margin-right: 1em" alt="BIFOLD Logo">
---
# reBEN — Lithuania Summer Subset (pre-converted to LMDB)
> **⚠️ Unofficial mirror.** This is an **unofficial, community-provided** pre-conversion of a subset of the BigEarthNet v2.0 (reBEN) dataset into LMDB format. It is provided as a convenience for researchers who wish to get started quickly without running the full conversion pipeline. In case of any discrepancy, **the original publication and the original files always take precedence**. Please refer to the authoritative sources listed below.
---
## Overview
This dataset card describes a pre-converted [LMDB](https://lmdb.readthedocs.io/en/release/) version of a subset of **BigEarthNet v2.0** (also known as **reBEN** — *Refined BigEarthNet*), a large-scale, multi-label remote sensing benchmark dataset. The subset contains patches from **Lithuania** acquired during the **summer months (June, July, August)**, identified by filtering on the acquisition month index embedded in the patch ID.
The original dataset was converted to LMDB format using [rico-HDL](https://github.com/kai-tub/rico-hdl), which is the recommended conversion tool for reBEN. The LMDB file stores Sentinel-1 and Sentinel-2 patches as serialized [SafeTensors](https://github.com/huggingface/safetensors) entries, keyed by patch ID.
The accompanying `metadata_lithuania_summer.parquet` file provides all patch-level metadata (labels, split assignments, geographic information, etc.) for the included patches.
---
## Authoritative Sources
Please always consult the following primary resources:
| Resource | Link |
|:---|:---|
| BigEarthNet project page | [bigearth.net](https://bigearth.net/) |
| BigEarthNet image–text dataset (txt.bigearth.net) | [txt.bigearth.net](https://txt.bigearth.net/) |
| Original reBEN files (Zenodo) | [zenodo.org/records/10891137](https://zenodo.org/records/10891137) |
| reBEN training scripts (official repository) | [git.tu-berlin.de/rsim/reben-training-scripts](https://git.tu-berlin.de/rsim/reben-training-scripts) |
| Pretrained model weights | [BIFOLD-BigEarthNetv2-0 on Hugging Face](https://huggingface.co/BIFOLD-BigEarthNetv2-0) |
---
## Dataset Details
### Subset Criteria
Patches were selected from the full reBEN dataset using two criteria applied to the metadata:
1. **Country:** `Lithuania` (i.e., `metadata.country == 'Lithuania'`)
2. **Season:** Summer, determined by the month index in the patch ID — specifically, patches where the character at position 5 of the third underscore-separated token in the patch ID is one of `{'6', '7', '8'}` (corresponding to June, July, and August).
In code:
```python
metadata_lithuania_summer = metadata[
(metadata.country == 'Lithuania') &
(metadata.patch_id.apply(lambda x: x.split('_')[2][5] in {'6', '7', '8'}))
]
```
### LMDB Structure
Each entry in the LMDB file is a [SafeTensors](https://github.com/huggingface/safetensors)-serialized object, keyed by either the Sentinel-2 patch ID or the corresponding Sentinel-1 patch name (`s1_name`). This matches the format produced by [rico-HDL](https://github.com/kai-tub/rico-hdl).
- **Sentinel-2 entries** contain bands: `B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12`
- **Sentinel-1 entries** contain bands: `VV, VH`
### Metadata
The `metadata_lithuania_summer.parquet` file is a direct subset of the full reBEN metadata parquet, filtered to the patches described above. It retains all original columns including patch IDs, labels, split assignments, and geographic metadata.
---
## Usage
The recommended way to use the dataset is with [configILM](https://github.com/lhackel-tub/ConfigILM), which directly supports pytorch dataloader and lightning datamodules with optimized multi-threaded loading.
Please follow the documentation for details.
Alternatively, to load individual patches from the LMDB file, you need `lmdb` and `safetensors`:
```bash
pip install lmdb safetensors pandas pyarrow
```
```python
import lmdb
import pandas as pd
from safetensors.numpy import load as safetensor_load
lmdb_path = "path/to/BENv2_lithuania_summer.lmdb"
metadata_path = "path/to/metadata_lithuania_summer.parquet"
metadata = pd.read_parquet(metadata_path)
lmdb_env = lmdb.open(lmdb_path, map_size=1024, max_dbs=False, readonly=True)
# Load a Sentinel-2 patch
patch_id = metadata.patch_id.iloc[0]
with lmdb_env.begin(write=False) as txn:
data = txn.get(patch_id.encode())
tensor = safetensor_load(data)
# Access individual bands
r, g, b = tensor["B04"], tensor["B03"], tensor["B02"]
# Load the corresponding Sentinel-1 patch
s1_name = metadata.s1_name.iloc[0]
with lmdb_env.begin(write=False) as txn:
data = txn.get(s1_name.encode())
tensor = safetensor_load(data)
vv, vh = tensor["VV"], tensor["VH"]
```
---
## Conversion Details
The LMDB was generated from the original reBEN dataset files (downloaded from [Zenodo](https://zenodo.org/records/10891137)) using [rico-HDL](https://github.com/kai-tub/rico-hdl), which is the officially recommended conversion tool for reBEN. The metadata subset was extracted using `pandas` and saved as a Parquet file.
---
## License
The underlying data is licensed under the **[Community Data License Agreement — Permissive, Version 1.0 (CDLA-Permissive-1.0)](https://cdla.dev/permissive-1-0/)**, consistent with the license of the original BigEarthNet v2.0 dataset. This pre-converted version inherits the same license.
---
## Citation
If you use this dataset in your research, please cite the original reBEN publication and the ConfigILM library:
```bibtex
@inproceedings{clasen2025refinedbigearthnet,
title={{reBEN}: Refined BigEarthNet Dataset for Remote Sensing Image Analysis},
author={Clasen, Kai Norman and Hackel, Leonard and Burgert, Tom and Sumbul, Gencer and Demir, Beg{\"u}m and Markl, Volker},
year={2025},
booktitle={IEEE International Geoscience and Remote Sensing Symposium (IGARSS)},
}
```
```bibtex
@article{hackel2024configilm,
title={ConfigILM: A general purpose configurable library for combining image and language models for visual question answering},
author={Hackel, Leonard and Clasen, Kai Norman and Demir, Beg{\"u}m},
journal={SoftwareX},
volume={26},
pages={101731},
year={2024},
publisher={Elsevier}
}
```
The preprint for reBEN is also available on arXiv:
> K. Clasen et al., "reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis", arXiv:2407.03653, 2024. [https://arxiv.org/abs/2407.03653](https://arxiv.org/abs/2407.03653)
提供机构:
hackelle



