five

Core-S2L1C-SSL4EO

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/Major-TOM/Core-S2L1C-SSL4EO
下载链接
链接失效反馈
官方服务:
资源简介:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6304c06eeb6d777a838eab63/JDxneZWkFfnfz6r_32zuF.png) # Core-S2L1C-SSL4EO 🟥🟩🟦🟧🟨🟪 🛰️ | Dataset | Modality | Number of Embeddings | Sensing Type | Total Comments | Source Dataset | Source Model | Size | |:--------:|:--------------:|:-------------------:|:------------:|:--------------:|:--------------:|:--------------:|:--------------:| |Core-S2L1C-SSL4EO|Sentinel-2 (Level 1C)|56,147,150|Multi-Spectral|General-Purpose Global|[Core-S2L1C](https://huggingface.co/datasets/Major-TOM/Core-S2L1C)|[SSL4EO-ResNet50-DINO](https://github.com/zhu-xlab/SSL4EO-S12)|252.9 GB| ## Content | Field | Type | Description | |:-----------------:|:--------:|-----------------------------------------------------------------------------| | unique_id | string | hash generated from geometry, time, product_id, and embedding model | | embedding | array | raw embedding array | | grid_cell | string | Major TOM cell | | grid_row_u | int | Major TOM cell row | | grid_col_r | int | Major TOM cell col | | product_id | string | ID of the original product | | timestamp | string | Timestamp of the sample | | centre_lat | float | Centre of the fragment latitude | | centre_lon | float | Centre of the fragment longitude | | geometry | geometry | Polygon footprint (WGS84) of the fragment | | utm_footprint | string | Polygon footprint (image UTM) of the fragment | | utm_crs | string | CRS of the original product | | pixel_bbox | bbox | Boundary box of the fragment (pixels) ## Input data * Sentinel-2 (Level 1C) multispectral dataset global coverage * All samples from [**MajorTOM Core-S2L1C**](https://huggingface.co/datasets/Major-TOM/Core-S2L1C) * Image input size: **224 x 224** pixels, target overlap: 10%, border_shift: True ## Model The image encoder of the [**SSL4EO-ResNet50-DINO model**](https://github.com/zhu-xlab/SSL4EO-S12) was used to extract embeddings. ## Example Use Interface scripts are available at ```python from datasets import load_dataset dataset = load_dataset("Major-TOM/Core-S2L1C-SSL4EO") ``` ## Generate Your Own Major TOM Embeddings The [**embedder**](https://github.com/ESA-PhiLab/Major-TOM/tree/main/src/embedder) subpackage of Major TOM provides tools for generating embeddings like these ones. You can see an example of this in a dedicated notebook at https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb. [![GitHub](https://img.shields.io/badge/GitHub-Generate%20Your%20Own%20Embeddings-blue?logo=github&style=flat-square)](https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb) --- ## Major TOM Global Embeddings Project 🏭 This dataset is a result of a collaboration between [**CloudFerro**](https://cloudferro.com/) 🔶 and [**Φ-lab, European Space Agency (ESA)**](https://philab.esa.int/) 🛰️ set up in order to provide open and free vectorised expansions of Major TOM datasets and define a standardised manner for releasing Major TOM embedding expansions. The embeddings extracted from common AI models make it possible to browse and navigate large datasets like Major TOM with reduced storage and computational demand. The datasets were computed on the [**GPU-accelerated instances**](https://cloudferro.com/ai/ai-computing-services/)⚡ provided by [**CloudFerro**](https://cloudferro.com/) 🔶 on the [**CREODIAS**](https://creodias.eu/) cloud service platform 💻☁️. Discover more at [**CloudFerro AI services**](https://cloudferro.com/ai/). ## Authors [**Mikolaj Czerkawski**](https://mikonvergence.github.io) (Φ-lab, European Space Agency), [**Marcin Kluczek**](https://www.linkedin.com/in/marcin-kluczek-03852a1a8/) (CloudFerro), [**Jędrzej S. Bojanowski**](https://www.linkedin.com/in/j%C4%99drzej-s-bojanowski-a5059872/) (CloudFerro) ## Open Access Manuscript This dataset is an output from the embedding expansion project outlined in: [https://arxiv.org/abs/2412.05600/](https://arxiv.org/abs/2412.05600/). [![arXiv](https://img.shields.io/badge/arXiv-10.48550/arXiv.2412.05600-B31B1B.svg)](https://doi.org/10.48550/arXiv.2412.05600) <details> <summary>Read Abstract</summary> > With the ever-increasing volumes of the Earth observation data present in the archives of large programmes such as Copernicus, there is a growing need for efficient vector representations of the underlying raw data. The approach of extracting feature representations from pretrained deep neural networks is a powerful approach that can provide semantic abstractions of the input data. However, the way this is done for imagery archives containing geospatial data has not yet been defined. In this work, an extension is proposed to an existing community project, Major TOM, focused on the provision and standardization of open and free AI-ready datasets for Earth observation. Furthermore, four global and dense embedding datasets are released openly and for free along with the publication of this manuscript, resulting in the most comprehensive global open dataset of geospatial visual embeddings in terms of covered Earth's surface. > </details> If this dataset was useful for you work, it can be cited as: ```latex @misc{EmbeddedMajorTOM, title={Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space}, author={Mikolaj Czerkawski and Marcin Kluczek and Jędrzej S. Bojanowski}, year={2024}, eprint={2412.05600}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.05600}, } ``` Powered by [Φ-lab, European Space Agency (ESA) 🛰️](https://philab.esa.int/) in collaboration with [CloudFerro 🔶](https://cloudferro.com/)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6304c06eeb6d777a838eab63/JDxneZWkFfnfz6r_32zuF.png) # Core-S2L1C-SSL4EO 🟥🟩🟦🟧🟨🟪 🛰️ | 数据集 | 模态 | 嵌入向量数量 | 感知类型 | 总备注 | 源数据集 | 源模型 | 大小 | |:--------:|:--------------:|:-------------------:|:------------:|:--------------:|:--------------:|:--------------:|:--------------:| | Core-S2L1C-SSL4EO | 哨兵二号(Sentinel-2)1C级数据 | 56,147,150 | 多光谱 | 通用全球场景 | [Core-S2L1C](https://huggingface.co/datasets/Major-TOM/Core-S2L1C) | [SSL4EO-ResNet50-DINO](https://github.com/zhu-xlab/SSL4EO-S12) | 252.9 GB | ## 数据内容 | 字段 | 类型 | 描述 | |:-----------------:|:--------:|-----------------------------------------------------------------------------| | 唯一标识符(unique_id) | 字符串 | 由几何信息、时间戳、产品ID和嵌入模型生成的哈希值 | | 嵌入向量(embedding) | 数组 | 原始嵌入向量数组 | | 网格单元(grid_cell) | 字符串 | Major TOM网格单元 | | 网格行号(grid_row_u) | 整数 | Major TOM网格的行号 | | 网格列号(grid_col_r) | 整数 | Major TOM网格的列号 | | 产品ID(product_id) | 字符串 | 原始产品的标识符 | | 时间戳(timestamp) | 字符串 | 样本的采集时间戳 | | 中心纬度(centre_lat) | 浮点数 | 影像片段的中心纬度 | | 中心经度(centre_lon) | 浮点数 | 影像片段的中心经度 | | 几何信息(geometry) | 几何类型 | 影像片段的多边形覆盖范围(WGS84坐标系) | | UTM覆盖范围(utm_footprint) | 字符串 | 影像片段的多边形覆盖范围(图像UTM坐标系) | | UTM坐标参考系(utm_crs) | 字符串 | 原始产品的坐标参考系 | | 像素边界框(pixel_bbox) | 边界框 | 影像片段的像素边界框 | ## 输入数据 * 哨兵二号(Sentinel-2)1C级多光谱全球覆盖数据集 * 所有样本均来自[**MajorTOM Core-S2L1C**](https://huggingface.co/datasets/Major-TOM/Core-S2L1C) * 图像输入尺寸:**224 × 224** 像素,目标重叠率:10%,允许边界偏移 ## 模型 使用[**SSL4EO-ResNet50-DINO模型**](https://github.com/zhu-xlab/SSL4EO-S12)的图像编码器提取嵌入向量。 ## 使用示例 可通过以下接口脚本使用: python from datasets import load_dataset dataset = load_dataset("Major-TOM/Core-S2L1C-SSL4EO") ## 生成自定义Major TOM嵌入向量 Major TOM的[**嵌入工具包**](https://github.com/ESA-PhiLab/Major-TOM/tree/main/src/embedder)子包提供了生成此类嵌入向量的工具。你可以在专用Jupyter笔记本中查看示例:https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb。 [![GitHub](https://img.shields.io/badge/GitHub-Generate%20Your%20Own%20Embeddings-blue?logo=github&style=flat-square)](https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb) --- ## Major TOM全球嵌入项目 🏭 本数据集由[**CloudFerro**](https://cloudferro.com/) 🔶与[**欧洲空间局Φ实验室(Φ-lab, European Space Agency (ESA))**](https://philab.esa.int/) 🛰️合作打造,旨在为Major TOM数据集提供开放免费的矢量化扩展,并定义标准化的Major TOM嵌入扩展发布规范。 从通用人工智能模型中提取的嵌入向量,可大幅降低浏览和处理Major TOM等大型数据集所需的存储与计算资源开销。 本数据集基于[**CloudFerro**](https://cloudferro.com/) 🔶在[**CREODIAS**](https://creodias.eu/)云服务平台上提供的[**GPU加速实例**](https://cloudferro.com/ai/ai-computing-services/)⚡计算完成。更多信息可访问[**CloudFerro人工智能服务**](https://cloudferro.com/ai/)。 ## 作者 [**米科拉伊·切尔卡夫斯基(Mikolaj Czerkawski)**](https://mikonvergence.github.io)(欧洲空间局Φ实验室),[**马尔钦·克鲁泽克(Marcin Kluczek)**](https://www.linkedin.com/in/marcin-kluczek-03852a1a8/)(CloudFerro),[**耶杰伊·S·博扬诺夫斯基(Jędrzej S. Bojanowski)**](https://www.linkedin.com/in/j%C4%99drzej-s-bojanowski-a5059872/)(CloudFerro) ## 开放获取论文 本数据集来自如下论文所述的嵌入扩展项目:[https://arxiv.org/abs/2412.05600/](https://arxiv.org/abs/2412.05600/)。 [![arXiv](https://img.shields.io/badge/arXiv-10.48550/arXiv.2412.05600-B31B1B.svg)](https://doi.org/10.48550/arXiv.2412.05600) <details> <summary>查看摘要</summary> > 随着哥白尼等大型项目存档中的地球观测数据量持续增长,对原始数据的高效向量表示的需求日益迫切。从预训练深度神经网络中提取特征表示是一种强大的方法,可为输入数据提供语义抽象。然而,针对包含地理空间数据的影像档案的此类处理方法尚未得到统一规范。本工作针对现有社区项目Major TOM提出了一项扩展,该项目专注于为地球观测领域提供开放免费的AI友好型数据集并实现标准化。此外,本文还公开发布了4个全球密集型嵌入数据集,结合本论文的发表,形成了目前覆盖地球表面最全面的公开地理空间视觉嵌入数据集。 > </details> 如果本数据集对你的研究有所帮助,可按照以下格式引用: latex @misc{EmbeddedMajorTOM, title={Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space}, author={Mikolaj Czerkawski and Marcin Kluczek and Jędrzej S. Bojanowski}, year={2024}, eprint={2412.05600}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.05600}, } 由[**欧洲空间局Φ实验室(Φ-lab, European Space Agency (ESA))**](https://philab.esa.int/) 🛰️与[**CloudFerro**](https://cloudferro.com/) 🔶合作提供技术支持。
提供机构:
maas
创建时间:
2025-08-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作