five

hackelle/BigEarthNetV2-LMDB

收藏
Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hackelle/BigEarthNetV2-LMDB
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cdla-permissive-1.0 task_categories: - image-classification tags: - remote sensing - classification - multi-label - sentinel-1 - sentinel-2 - multispectral - multimodal - SAR - BigEarthNet - reBEN - LMDB pretty_name: reBEN (pre-converted to LMDB) configs: - config_name: default data_files: - split: all_data path: metadata.parquet default: true size_categories: - 100K<n<1M --- [TU Berlin](https://www.tu.berlin/) | [RSiM](https://rsim.berlin/) | [DIMA](https://www.dima.tu-berlin.de/menue/database_systems_and_information_management_group/) | [BigEarth](http://www.bigearth.eu/) | [BIFOLD](https://bifold.berlin/) :---:|:---:|:---:|:---:|:---: <a href="https://www.tu.berlin/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/tu-berlin-logo-long-red.svg" style="font-size: 1rem; height: 2em; width: auto" alt="TU Berlin Logo"/> | <a href="https://rsim.berlin/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/RSiM_Logo_1.png" style="font-size: 1rem; height: 2em; width: auto" alt="RSiM Logo"> | <a href="https://www.dima.tu-berlin.de/menue/database_systems_and_information_management_group/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/DIMA.png" style="font-size: 1rem; height: 2em; width: auto" alt="DIMA Logo"> | <a href="http://www.bigearth.eu/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/BigEarth.png" style="font-size: 1rem; height: 2em; width: auto" alt="BigEarth Logo"> | <a href="https://bifold.berlin/"><img src="https://raw.githubusercontent.com/wiki/lhackel-tub/ConfigILM/static/imgs/BIFOLD_Logo_farbig.png" style="font-size: 1rem; height: 2em; width: auto; margin-right: 1em" alt="BIFOLD Logo"> --- # reBEN (pre-converted to LMDB) > **⚠️ Unofficial mirror.** This is an **unofficial, community-provided** pre-conversion of the BigEarthNet v2.0 (reBEN) dataset into LMDB format. It is provided as a convenience for researchers who wish to get started quickly without running the full conversion pipeline. In case of any discrepancy, **the original publication and the original files always take precedence**. Please refer to the authoritative sources listed below. --- ## Overview This dataset card describes a pre-converted [LMDB](https://lmdb.readthedocs.io/en/release/) version of **BigEarthNet v2.0** (also known as **reBEN** — *Refined BigEarthNet*), a large-scale, multi-label remote sensing benchmark dataset. The dataset was converted to LMDB format using [rico-HDL](https://github.com/kai-tub/rico-hdl), which is the recommended conversion tool for reBEN. The LMDB file stores Sentinel-1 and Sentinel-2 patches as serialized [SafeTensors](https://github.com/huggingface/safetensors) entries, keyed by patch ID. The accompanying `metadata.parquet` file provides all patch-level metadata (labels, split assignments, geographic information, etc.) for the included patches _without seasonal snow and cloud shadows_. These are the patches that are recommended for most settings. It is the same file that can be downloaded from the official website. --- ## Authoritative Sources Please always consult the following primary resources: | Resource | Link | |:---|:---| | BigEarthNet project page | [bigearth.net](https://bigearth.net/) | | BigEarthNet image–text dataset (txt.bigearth.net) | [txt.bigearth.net](https://txt.bigearth.net/) | | Original reBEN files (Zenodo) | [zenodo.org/records/10891137](https://zenodo.org/records/10891137) | | reBEN training scripts (official repository) | [git.tu-berlin.de/rsim/reben-training-scripts](https://git.tu-berlin.de/rsim/reben-training-scripts) | | Pretrained model weights | [BIFOLD-BigEarthNetv2-0 on Hugging Face](https://huggingface.co/BIFOLD-BigEarthNetv2-0) | --- ## Dataset Details ### LMDB Structure Each entry in the LMDB file is a [SafeTensors](https://github.com/huggingface/safetensors)-serialized object, keyed by either the Sentinel-2 patch ID or the corresponding Sentinel-1 patch name (`s1_name`). This matches the format produced by [rico-HDL](https://github.com/kai-tub/rico-hdl). - **Sentinel-2 entries** contain bands: `B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12` - **Sentinel-1 entries** contain bands: `VV, VH` ### Metadata The `metadata.parquet` file is a direct copy of the full reBEN metadata parquet. It contains all original columns including patch IDs, labels, split assignments, and geographic metadata. --- ## Usage The recommended way to use the dataset is with [configILM](https://github.com/lhackel-tub/ConfigILM), which directly supports pytorch dataloader and lightning datamodules with optimized multi-threaded loading. Please follow the documentation for details. Alternatively, to load individual patches from the LMDB file, you need `lmdb` and `safetensors`: ```bash pip install lmdb safetensors pandas pyarrow ``` ```python import lmdb import pandas as pd from safetensors.numpy import load as safetensor_load lmdb_path = "path/to/BENv2.lmdb" metadata_path = "path/to/metadata.parquet" metadata = pd.read_parquet(metadata_path) lmdb_env = lmdb.open(lmdb_path, map_size=1024, max_dbs=False, readonly=True) # Load a Sentinel-2 patch patch_id = metadata.patch_id.iloc[0] with lmdb_env.begin(write=False) as txn: data = txn.get(patch_id.encode()) tensor = safetensor_load(data) # Access individual bands r, g, b = tensor["B04"], tensor["B03"], tensor["B02"] # Load the corresponding Sentinel-1 patch s1_name = metadata.s1_name.iloc[0] with lmdb_env.begin(write=False) as txn: data = txn.get(s1_name.encode()) tensor = safetensor_load(data) vv, vh = tensor["VV"], tensor["VH"] ``` --- ## Conversion Details The LMDB was generated from the original reBEN dataset files (downloaded from [Zenodo](https://zenodo.org/records/10891137)) using [rico-HDL](https://github.com/kai-tub/rico-hdl), which is the officially recommended conversion tool for reBEN. --- ## License The underlying data is licensed under the **[Community Data License Agreement — Permissive, Version 1.0 (CDLA-Permissive-1.0)](https://cdla.dev/permissive-1-0/)**, consistent with the license of the original BigEarthNet v2.0 dataset. This pre-converted version inherits the same license. --- ## Citation If you use this dataset in your research, please cite the original reBEN publication and the ConfigILM library: ```bibtex @inproceedings{clasen2025refinedbigearthnet, title={{reBEN}: Refined BigEarthNet Dataset for Remote Sensing Image Analysis}, author={Clasen, Kai Norman and Hackel, Leonard and Burgert, Tom and Sumbul, Gencer and Demir, Beg{\"u}m and Markl, Volker}, year={2025}, booktitle={IEEE International Geoscience and Remote Sensing Symposium (IGARSS)}, } ``` ```bibtex @article{hackel2024configilm, title={ConfigILM: A general purpose configurable library for combining image and language models for visual question answering}, author={Hackel, Leonard and Clasen, Kai Norman and Demir, Beg{\"u}m}, journal={SoftwareX}, volume={26}, pages={101731}, year={2024}, publisher={Elsevier} } ``` The preprint for reBEN is also available on arXiv: > K. Clasen et al., "reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis", arXiv:2407.03653, 2024. [https://arxiv.org/abs/2407.03653](https://arxiv.org/abs/2407.03653)
提供机构:
hackelle
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作