Core-S1RTC-SSL4EO
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/Major-TOM/Core-S1RTC-SSL4EO
下载链接
链接失效反馈官方服务:
资源简介:

# Core-S1RTC-SSL4EO 📡⚡🛰️
| Dataset | Modality | Number of Embeddings | Sensing Type | Total Comments | Source Dataset | Source Model | Size |
|:--------:|:--------------:|:-------------------:|:------------:|:--------------:|:--------------:|:--------------:|:--------------:|
|Core-S1RTC-SSL4EO|Sentinel-1 RTC|36,748,875|SAR|General-Purpose Global|[Core-S1RTC](https://huggingface.co/datasets/Major-TOM/Core-S1RTC)|[SSL4EO-ResNet50-MOCO](https://github.com/zhu-xlab/SSL4EO-S12)|332.5 GB|
## Content
| Field | Type | Description |
|:-----------------:|:--------:|-----------------------------------------------------------------------------|
| unique_id | string | hash generated from geometry, time, product_id, and embedding model |
| embedding | array | raw embedding array |
| grid_cell | string | Major TOM cell |
| grid_row_u | int | Major TOM cell row |
| grid_col_r | int | Major TOM cell col |
| product_id | string | ID of the original product |
| timestamp | string | Timestamp of the sample |
| centre_lat | float | Centre of the fragment latitude |
| centre_lon | float | Centre of the fragment longitude |
| geometry | geometry | Polygon footprint (WGS84) of the fragment |
| utm_footprint | string | Polygon footprint (image UTM) of the fragment |
| utm_crs | string | CRS of the original product |
| pixel_bbox | bbox | Boundary box of the fragment (pixels)
## Input Data
* Sentinel-1 RTC radar dataset global coverage
* All samples from [**MajorTOM Core-S1RTC**](https://huggingface.co/datasets/Major-TOM/Core-S1RTC)
* Image input size: **224 x 224** pixels, target overlap: 10%, border_shift: True
## Model
The image encoder of the [**SSL4EO-ResNet50-MOCO model**](https://github.com/zhu-xlab/SSL4EO-S12) was used to extract embeddings.
## Example Use
Interface scripts are available at
```python
from datasets import load_dataset
dataset = load_dataset("Major-TOM/Core-S1RTC-SSL4EO")
```
## Generate Your Own Major TOM Embeddings
The [**embedder**](https://github.com/ESA-PhiLab/Major-TOM/tree/main/src/embedder) subpackage of Major TOM provides tools for generating embeddings like these ones. You can see an example of this in a dedicated notebook at https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb.
[](https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb)
---
## Major TOM Global Embeddings Project 🏭
This dataset is a result of a collaboration between [**CloudFerro**](https://cloudferro.com/) 🔶 and [**Φ-lab, European Space Agency (ESA)**](https://philab.esa.int/) 🛰️ set up in order to provide open and free vectorised expansions of Major TOM datasets and define a standardised manner for releasing Major TOM embedding expansions.
The embeddings extracted from common AI models make it possible to browse and navigate large datasets like Major TOM with reduced storage and computational demand.
The datasets were computed on the [**GPU-accelerated instances**](https://cloudferro.com/ai/ai-computing-services/)⚡ provided by [**CloudFerro**](https://cloudferro.com/) 🔶 on the [**CREODIAS**](https://creodias.eu/) cloud service platform 💻☁️.
Discover more at [**CloudFerro AI services**](https://cloudferro.com/ai/).
## Authors
[**Mikolaj Czerkawski**](https://mikonvergence.github.io) (Φ-lab, European Space Agency), [**Marcin Kluczek**](https://www.linkedin.com/in/marcin-kluczek-03852a1a8/) (CloudFerro), [**Jędrzej S. Bojanowski**](https://www.linkedin.com/in/j%C4%99drzej-s-bojanowski-a5059872/) (CloudFerro)
## Open Access Manuscript
This dataset is an output from the embedding expansion project outlined in: [https://arxiv.org/abs/2412.05600/](https://arxiv.org/abs/2412.05600/).
[](https://doi.org/10.48550/arXiv.2412.05600)
<details>
<summary>Read Abstract</summary>
> With the ever-increasing volumes of the Earth observation data present in the archives of large programmes such as Copernicus, there is a growing need for efficient vector representations of the underlying raw data. The approach of extracting feature representations from pretrained deep neural networks is a powerful approach that can provide semantic abstractions of the input data. However, the way this is done for imagery archives containing geospatial data has not yet been defined. In this work, an extension is proposed to an existing community project, Major TOM, focused on the provision and standardization of open and free AI-ready datasets for Earth observation. Furthermore, four global and dense embedding datasets are released openly and for free along with the publication of this manuscript, resulting in the most comprehensive global open dataset of geospatial visual embeddings in terms of covered Earth's surface.
> </details>
If this dataset was useful for you work, it can be cited as:
```latex
@misc{EmbeddedMajorTOM,
title={Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space},
author={Mikolaj Czerkawski and Marcin Kluczek and Jędrzej S. Bojanowski},
year={2024},
eprint={2412.05600},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.05600},
}
```
Powered by [Φ-lab, European Space Agency (ESA) 🛰️](https://philab.esa.int/) in collaboration with [CloudFerro 🔶](https://cloudferro.com/)

# Core-S1RTC-SSL4EO 📡⚡🛰️
| 数据集 | 模态 | 嵌入向量数量 | 传感类型 | 备注信息 | 源数据集 | 源模型 | 大小 |
|:--------:|:--------------:|:-------------------:|:------------:|:--------------:|:--------------:|:--------------:|:--------------:|
| Core-S1RTC-SSL4EO | Sentinel-1 辐射地形校正产品(Sentinel-1 RTC) | 36,748,875 | 合成孔径雷达(SAR, Synthetic Aperture Radar) | 通用全球场景 | [Core-S1RTC](https://huggingface.co/datasets/Major-TOM/Core-S1RTC) | [SSL4EO-ResNet50-MOCO](https://github.com/zhu-xlab/SSL4EO-S12) | 332.5 GB |
## 数据字段说明
| 字段名 | 数据类型 | 描述 |
|:-----------------:|:--------:|-----------------------------------------------------------------------------|
| unique_id | 字符串 | 由几何信息、时间戳、产品ID及嵌入模型生成的哈希值 |
| embedding | 数组 | 原始嵌入向量数组 |
| grid_cell | 字符串 | Major TOM 网格单元 |
| grid_row_u | 整数 | Major TOM 网格单元行号 |
| grid_col_r | 整数 | Major TOM 网格单元列号 |
| product_id | 字符串 | 原始产品的唯一标识符 |
| timestamp | 字符串 | 样本的时间戳 |
| centre_lat | 浮点数 | 影像片段中心纬度 |
| centre_lon | 浮点数 | 影像片段中心经度 |
| geometry | 几何对象 | 影像片段的多边形覆盖范围(WGS84坐标系) |
| utm_footprint | 字符串 | 影像片段的UTM坐标系多边形覆盖范围 |
| utm_crs | 字符串 | 原始产品的坐标参考系统(CRS) |
| pixel_bbox | 边界框 | 影像片段的像素级边界框 |
## 输入数据
* 全球覆盖的 Sentinel-1 RTC 雷达数据集
* 所有样本均来自 [**MajorTOM Core-S1RTC**](https://huggingface.co/datasets/Major-TOM/Core-S1RTC)
* 图像输入尺寸:**224 × 224** 像素,目标重叠率:10%,边界偏移(border_shift):开启(True)
## 模型
使用 [**SSL4EO-ResNet50-MOCO模型**](https://github.com/zhu-xlab/SSL4EO-S12) 的图像编码器提取嵌入向量。
## 示例使用
可通过以下接口脚本调用:
python
from datasets import load_dataset
dataset = load_dataset("Major-TOM/Core-S1RTC-SSL4EO")
## 生成自定义 Major TOM 嵌入向量
Major TOM 的 [**embedder**](https://github.com/ESA-PhiLab/Major-TOM/tree/main/src/embedder) 子包提供了生成此类嵌入向量的工具,相关示例可参考专用 Jupyter 笔记本:https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb。
[](https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb)
---
## Major TOM 全球嵌入向量项目 🏭
本数据集由 [**CloudFerro**](https://cloudferro.com/) 🔶 与 [**欧洲空间局(ESA)Φ-lab**](https://philab.esa.int/) 🛰️ 合作开发,旨在为 Major TOM 数据集提供开放免费的矢量化扩展,并定义标准化的 Major TOM 嵌入扩展发布规范。
通过从主流AI模型中提取的嵌入向量,可在降低存储与计算开销的前提下,实现对 Major TOM 等大型数据集的高效浏览与检索。
本数据集基于 [**CloudFerro**](https://cloudferro.com/) 🔶 提供的 [**GPU加速实例**](https://cloudferro.com/ai/ai-computing-services/) ⚡,在 [**CREODIAS**](https://creodias.eu/) 云服务平台 💻☁️ 上完成计算。更多信息可访问 [**CloudFerro AI 服务**](https://cloudferro.com/ai/)。
## 作者
[**Mikolaj Czerkawski**](https://mikonvergence.github.io)(欧洲空间局Φ-lab),[**Marcin Kluczek**](https://www.linkedin.com/in/marcin-kluczek-03852a1a8/)(CloudFerro),[**Jędrzej S. Bojanowski**](https://www.linkedin.com/in/j%C4%99drzej-s-bojanowski-a5059872/)(CloudFerro)
## 开放获取学术论文
本数据集来自下述嵌入扩展项目的产出:[https://arxiv.org/abs/2412.05600/](https://arxiv.org/abs/2412.05600/)。
[](https://doi.org/10.48550/arXiv.2412.05600)
<details>
<summary>查看摘要</summary>
> 随着哥白尼计划等大型项目存档的地球观测数据量持续增长,对原始数据的高效向量表示的需求日益迫切。从预训练深度神经网络中提取特征表示是一种强大的方法,可实现输入数据的语义抽象,但针对包含地理空间数据的影像档案的此类实现方式尚未统一规范。本工作针对现有社区项目 Major TOM 进行扩展,该项目聚焦于为地球观测领域提供开放免费的AI就绪标准化数据集。此外,本研究随论文公开发布了4个全球密集型嵌入数据集,就覆盖的地球表面范围而言,这是目前最全面的开源地理空间视觉嵌入全球数据集。
> </details>
如果本数据集对你的研究有所帮助,可通过以下 BibTeX 格式引用:
latex
@misc{EmbeddedMajorTOM,
title={Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space},
author={Mikolaj Czerkawski and Marcin Kluczek and Jędrzej S. Bojanowski},
year={2024},
eprint={2412.05600},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.05600},
}
本项目由 [**欧洲空间局(ESA)Φ-lab 🛰️**](https://philab.esa.int/) 与 [**CloudFerro 🔶**](https://cloudferro.com/) 合作支持。
提供机构:
maas
创建时间:
2025-08-26



