five

Core-S1RTC-DeCUR

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/Major-TOM/Core-S1RTC-DeCUR
下载链接
链接失效反馈
官方服务:
资源简介:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6304c06eeb6d777a838eab63/fqJBPiWGkViYLsICd5BRd.png) # Core-S1RTC-DeCUR 📡⚡🛰️ | Dataset | Modality | Number of Embeddings | Sensing Type | Total Comments | Source Dataset | Source Model | Size | |:--------:|:--------------:|:-------------------:|:------------:|:--------------:|:--------------:|:--------------:|:--------------:| |Core-S1RTC-SSL4EO|Sentinel-1 RTC|36,748,875|SAR|General-Purpose Global|[Core-S1RTC](https://huggingface.co/datasets/Major-TOM/Core-S1RTC)|[DeCUR](https://github.com/zhu-xlab/DeCUR)|GB| ## Content | Field | Type | Description | |:-----------------:|:--------:|-----------------------------------------------------------------------------| | unique_id | string | hash generated from geometry, time, product_id, and embedding model | | embedding | array | raw embedding array | | grid_cell | string | Major TOM cell | | grid_row_u | int | Major TOM cell row | | grid_col_r | int | Major TOM cell col | | product_id | string | ID of the original product | | timestamp | string | Timestamp of the sample | | centre_lat | float | Centre of the fragment latitude | | centre_lon | float | Centre of the fragment longitude | | geometry | geometry | Polygon footprint (WGS84) of the fragment | | utm_footprint | string | Polygon footprint (image UTM) of the fragment | | utm_crs | string | CRS of the original product | | pixel_bbox | bbox | Boundary box of the fragment (pixels) ## Input Data * Sentinel-1 RTC radar dataset global coverage * All samples from [**MajorTOM Core-S1RTC**](https://huggingface.co/datasets/Major-TOM/Core-S1RTC) * Image input size: **224 x 224** pixels, target overlap: 10%, border_shift: True ## Model The image encoder of the [**DeCUR model**](https://github.com/zhu-xlab/DeCUR) was used to extract embeddings. ## Example Use Interface scripts are available at ```python from datasets import load_dataset dataset = load_dataset("Major-TOM/Core-S1RTC-DeCUR") ``` ## Generate Your Own Major TOM Embeddings The [**embedder**](https://github.com/ESA-PhiLab/Major-TOM/tree/main/src/embedder) subpackage of Major TOM provides tools for generating embeddings like these ones. You can see an example of this in a dedicated notebook at https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb. [![GitHub](https://img.shields.io/badge/GitHub-Generate%20Your%20Own%20Embeddings-blue?logo=github&style=flat-square)](https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb) --- ## Major TOM Global Embeddings Project 🏭 This dataset is a result of a collaboration between [**CloudFerro**](https://cloudferro.com/) 🔶, [asterisk labs](https://asterisk.coop/) and [**Φ-lab, European Space Agency (ESA)**](https://philab.esa.int/) 🛰️ set up in order to provide open and free vectorised expansions of Major TOM datasets and define a standardised manner for releasing Major TOM embedding expansions. The embeddings extracted from common AI models make it possible to browse and navigate large datasets like Major TOM with reduced storage and computational demand. The datasets were computed on the [**GPU-accelerated instances**](https://cloudferro.com/ai/ai-computing-services/)⚡ provided by [**CloudFerro**](https://cloudferro.com/) 🔶 on the [**CREODIAS**](https://creodias.eu/) cloud service platform 💻☁️. Discover more at [**CloudFerro AI services**](https://cloudferro.com/ai/). ## Authors [**Mikolaj Czerkawski**](https://mikonvergence.github.io) (Φ-lab, European Space Agency), [**Marcin Kluczek**](https://www.linkedin.com/in/marcin-kluczek-03852a1a8/) (CloudFerro), [**Jędrzej S. Bojanowski**](https://www.linkedin.com/in/j%C4%99drzej-s-bojanowski-a5059872/) (CloudFerro) ## Open Access Manuscript This dataset is an output from the embedding expansion project outlined in: [https://arxiv.org/abs/2412.05600/](https://arxiv.org/abs/2412.05600/). [![arXiv](https://img.shields.io/badge/arXiv-10.48550/arXiv.2412.05600-B31B1B.svg)](https://doi.org/10.48550/arXiv.2412.05600) <details> <summary>Read Abstract</summary> > With the ever-increasing volumes of the Earth observation data present in the archives of large programmes such as Copernicus, there is a growing need for efficient vector representations of the underlying raw data. The approach of extracting feature representations from pretrained deep neural networks is a powerful approach that can provide semantic abstractions of the input data. However, the way this is done for imagery archives containing geospatial data has not yet been defined. In this work, an extension is proposed to an existing community project, Major TOM, focused on the provision and standardization of open and free AI-ready datasets for Earth observation. Furthermore, four global and dense embedding datasets are released openly and for free along with the publication of this manuscript, resulting in the most comprehensive global open dataset of geospatial visual embeddings in terms of covered Earth's surface. > </details> If this dataset was useful for you work, it can be cited as: ```latex @misc{EmbeddedMajorTOM, title={Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space}, author={Mikolaj Czerkawski and Marcin Kluczek and Jędrzej S. Bojanowski}, year={2024}, eprint={2412.05600}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.05600}, } ``` Powered by [Φ-lab, European Space Agency (ESA) 🛰️](https://philab.esa.int/) in collaboration with [CloudFerro 🔶](https://cloudferro.com/) & [asterisk labs](https://asterisk.coop/)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6304c06eeb6d777a838eab63/fqJBPiWGkViYLsICd5BRd.png) # Core-S1RTC-DeCUR 📡⚡🛰️ | 数据集名称 | 模态类型 | 嵌入向量总数 | 传感类型 | 总体说明 | 源数据集 | 源模型 | 大小 | |:--------:|:--------------:|:-------------------:|:------------:|:--------------:|:--------------:|:--------------:|:--------------:| | Core-S1RTC-SSL4EO | 哨兵1号辐射地形校正(Sentinel-1 RTC) | 36,748,875 | 合成孔径雷达(SAR,Synthetic Aperture Radar) | 通用型全球数据集 | [Core-S1RTC](https://huggingface.co/datasets/Major-TOM/Core-S1RTC) | [DeCUR](https://github.com/zhu-xlab/DeCUR) | GB | ## 数据字段说明 | 字段名称 | 数据类型 | 字段说明 | |:-----------------:|:--------:|-----------------------------------------------------------------------------| | unique_id | 字符串 | 由几何信息、时间戳、产品ID与嵌入模型生成的哈希值 | | embedding | 数组 | 原始嵌入向量数组 | | grid_cell | 字符串 | Major TOM网格单元 | | grid_row_u | 整数 | Major TOM网格单元行号 | | grid_col_r | 整数 | Major TOM网格单元列号 | | product_id | 字符串 | 原始产品的唯一标识符 | | timestamp | 字符串 | 样本采集时间戳 | | centre_lat | 浮点数 | 影像片段中心点纬度 | | centre_lon | 浮点数 | 影像片段中心点经度 | | geometry | 几何对象 | 影像片段的多边形覆盖范围(WGS84坐标系) | | utm_footprint | 字符串 | 影像片段的UTM坐标系多边形覆盖范围 | | utm_crs | 字符串 | 原始产品使用的坐标参考系 | | pixel_bbox | 边界框(bbox) | 影像片段的像素级边界框 | ## 输入数据 * 哨兵1号辐射地形校正雷达数据集,覆盖全球范围 * 所有样本均取自[**MajorTOM Core-S1RTC**](https://huggingface.co/datasets/Major-TOM/Core-S1RTC) * 图像输入尺寸:**224 × 224** 像素,目标重叠率:10%,开启边界偏移(border_shift: True) ## 模型 本数据集使用[**DeCUR模型**](https://github.com/zhu-xlab/DeCUR)的图像编码器提取嵌入向量。 ## 示例使用 可通过以下接口脚本加载该数据集: python from datasets import load_dataset dataset = load_dataset("Major-TOM/Core-S1RTC-DeCUR") ## 自定义生成Major TOM嵌入向量 Major TOM的[**嵌入工具包**](https://github.com/ESA-PhiLab/Major-TOM/tree/main/src/embedder)子包提供了生成此类嵌入向量的工具。相关使用示例可参阅专用Jupyter笔记本:https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb。 [![GitHub](https://img.shields.io/badge/GitHub-Generate%20Your%20Own%20Embeddings-blue?logo=github&style=flat-square)](https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb) --- ## Major TOM 全球嵌入向量项目 🏭 本数据集由[**CloudFerro**](https://cloudferro.com/) 🔶、[**asterisk labs**](https://asterisk.coop/) 与[**欧洲空间局(ESA)Φ实验室**](https://philab.esa.int/) 🛰️合作开发,旨在为Major TOM数据集提供开放免费的矢量化扩展方案,并定义标准化的Major TOM嵌入扩展发布规范。 通过通用AI模型提取的嵌入向量可显著降低存储与计算开销,实现对Major TOM等大型地球观测数据集的高效浏览与检索。 本数据集基于[**CloudFerro**](https://cloudferro.com/) 🔶在[**CREODIAS**](https://creodias.eu/)云服务平台提供的[**GPU加速实例**](https://cloudferro.com/ai/ai-computing-services/)⚡计算完成。更多相关信息可参阅[CloudFerro人工智能服务页面](https://cloudferro.com/ai/)。 ## 作者 [**Mikolaj Czerkawski**](https://mikonvergence.github.io)(欧洲空间局Φ实验室)、[**Marcin Kluczek**](https://www.linkedin.com/in/marcin-kluczek-03852a1a8/)(CloudFerro)、[**Jędrzej S. Bojanowski**](https://www.linkedin.com/in/j%C4%99drzej-s-bojanowski-a5059872/)(CloudFerro) ## 开放获取学术文稿 本数据集源自以下嵌入扩展项目的研究成果:[https://arxiv.org/abs/2412.05600/](https://arxiv.org/abs/2412.05600/)。 [![arXiv](https://img.shields.io/badge/arXiv-10.48550/arXiv.2412.05600-B31B1B.svg)](https://doi.org/10.48550/arXiv.2412.05600) <details> <summary>查看摘要</summary> > 随着哥白尼计划等大型科研项目存档的地球观测数据量持续增长,对原始观测数据的高效矢量表示需求日益迫切。从预训练深度学习神经网络中提取特征表示是一种强大的方法,可实现输入数据的语义抽象,但针对包含地理空间数据的影像档案的此类处理流程尚未形成统一标准。本工作针对现有社区项目Major TOM提出扩展方案,该项目专注于为地球观测领域提供开放免费的AI就绪标准化数据集。此外,本研究随本文稿公开发布了4个全球密集型嵌入向量数据集,就覆盖的地球表面范围而言,这是目前最全面的公开地理空间视觉嵌入向量数据集。 > </details> 若本数据集对您的研究有所帮助,可按以下LaTeX格式引用: latex @misc{EmbeddedMajorTOM, title={Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space}, author={Mikolaj Czerkawski and Marcin Kluczek and Jędrzej S. Bojanowski}, year={2024}, eprint={2412.05600}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.05600}, } 本项目由[**欧洲空间局(ESA)Φ实验室** 🛰️](https://philab.esa.int/)与[CloudFerro 🔶](https://cloudferro.com/)及[asterisk labs](https://asterisk.coop/)联合推出。
提供机构:
maas
创建时间:
2025-08-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作