WUqiuping/Core-DEM
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/WUqiuping/Core-DEM
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
size_categories:
- 1M<n<10M
task_categories:
- other
tags:
- earth-observation
- remote-sensing
- elevation
- satellite
- geospatial
dataset_info:
- config_name: default
features:
- name: grid_cell
dtype: string
- name: thumbnail
dtype: image
- name: compressed
dtype: image
- name: DEM
dtype: binary
configs:
- config_name: default
data_files: images/*.parquet
- config_name: metadata
data_files: metadata.parquet
---

# Major TOM Core-DEM
Major TOM Core-DEM contains a global coverage of [Copernicus DEM](https://spacedata.copernicus.eu/collections/copernicus-digital-elevation-model), each of size 356 x 356 pixels.
This dataset was created to support the development of the [MESA terrain generation model](https://paulbornep.github.io/mesa-terrain/). It is also featured in the paper [EarthEmbeddingExplorer: A Web Application for Cross-Modal Retrieval of Global Satellite Images](https://huggingface.co/papers/2603.29441) and is part of the [Major TOM: Expandable Datasets for Earth Observation](https://arxiv.org/abs/2402.12095) ecosystem.
- **Official Viewer App:**: [Major TOM Viewer](https://huggingface.co/spaces/Major-TOM/MajorTOM-Core-Viewer)
- **Major TOM GitHub:** [ESA-PhiLab/Major-TOM](https://github.com/ESA-PhiLab/Major-TOM)
- **EarthEmbeddingExplorer:** [ModelScope App](https://modelscope.ai/studios/Major-TOM/EarthEmbeddingExplorer)
| Source | Modality Type | Number of Patches | Patch Size | Total Pixels |
|:-------|:-------------:|:-----------------:|:----------:|:------------:|
|Copernicus DEM 30 | Digital Surface Model (DSM) |1,837,843| 356 x 356 (30 m) | > 1.654 Billion |
## Dataset Content
| Column | Details | Resolution |
|:-------|:--------|:-----------|
| DEM | Original data | 30m |
| thumbnail | compressed hillshade visualisation | 30m |
| compressed | compressed png of original data | 30m |
## Spatial Coverage
This is a global monotemporal dataset that contains nearly the entire COP-DEM dataset.
The following figure demonstrates the spatial coverage (only black pixels are absent):

In this first version, all available DEM data was included except for the Major TOM cells below the 89th latitude and two degrees west off the date change line. Azerbaijan and Armenia weren’t included either as they are unavailable on the Creodias platform used to create this dataset.
## Example Use
Interface scripts are available at https://github.com/ESA-PhiLab/Major-TOM
Here's an example with reading directly via http from HuggingFace:
```python
from fsspec.parquet import open_parquet_file
import pyarrow.parquet as pq
from rasterio.io import MemoryFile
from PIL import Image
PARQUET_FILE = 'part_00390' # parquet number
ROW_INDEX = 42 # row number (about 500 per parquet)
url = "https://huggingface.co/datasets/Major-TOM/Core-DEM/resolve/main/images/{}.parquet".format(PARQUET_FILE)
with open_parquet_file(url,columns = ["DEM"]) as f:
with pq.ParquetFile(f) as pf:
first_row_group = pf.read_row_group(ROW_INDEX, columns=['DEM'])
with MemoryFile(first_row_group['DEM'][0].as_py()) as mem_f:
with mem_f.open(driver='GTiff') as f:
dem = f.read()
```
And here's an example with a thumbnail image:
```python
from fsspec.parquet import open_parquet_file
import pyarrow.parquet as pq
from io import BytesIO
from PIL import Image
PARQUET_FILE = 'part_00390' # parquet number
ROW_INDEX = 42 # row number (about 500 per parquet)
url = "https://huggingface.co/datasets/Major-TOM/Core-DEM/resolve/main/images/{}.parquet".format(PARQUET_FILE)
with open_parquet_file(url,columns = ["thumbnail"]) as f:
with pq.ParquetFile(f) as pf:
first_row_group = pf.read_row_group(ROW_INDEX, columns=['thumbnail'])
stream = BytesIO(first_row_group['thumbnail'][0].as_py())
image = Image.open(stream)
```
### Reprojection Details
Contrary to [S1 RTC](huggingface.co/datasets/Major-TOM/Core-S1RTC) and S2 ([L1C](huggingface.co/datasets/Major-TOM/Core-S2L1C) & [L2A](huggingface.co/datasets/Major-TOM/Core-S2L2A)) products, which are taken in their native projection to create their respective Major TOM Core datasets, Copernicus DEM, natively in EPSG:4326, was reprojected to a carefully chosen projection. To guarantee uniformity across Major Tom sources, it was reprojected to the corresponding UTM zone of the cell. This leads to inconsistency between Sentinel-2 and COP-DEM cells in some cases. For the S2-L2A product this is estimated to 2.5% of all the cells where COP-DEM and S2-L2A are available (41,998 out of 1,679,898 cells).
Large DEM tiles were projected and resampled to 30m using bilinear interpolation. Small major tom cells were then cropped for it using nearest neighbor interpolation if needed. Some tiles above water and around Armenia and Azerbaijan, may exhibit missing pixels which value were set to -32767.

### Credits
This dataset is the product of a collaboration between [Φ-lab, European Space Agency (ESA)](https://huggingface.co/ESA-philab) and the [Adobe Research (Paris, France)](https://research.adobe.com/careers/paris/). The dataset was put together by [Paul Borne--Pons](https://www.linkedin.com/in/paul-bp-cs/) under the supervision of Mikolaj Czerkawski and Alistair Francis (the original authors of the Major TOM project) as part of his stay at ESA Phi Lab. The idea behind this collaboration is to explore the synergies between Sentinel 2 products and DEM data, notably for the generation of terrains.
### Citation
```latex
@inproceedings{mesa2025,
title={MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data},
author={Paul Borne--Pons and Mikolaj Czerkawski and Rosalie Martin and Romain Rouffet},
year={2025},
booktitle={MORSE Workshop at CVPR 2025},
eprint={2504.07210},
url={https://arxiv.org/abs/2504.07210}}
```
---
Produced using Copernicus WorldDEM-30 © DLR e.V. 2010-2014 and © Airbus Defence and Space GmbH
2014-2018 provided under COPERNICUS by the European Union and ESA; all rights reserved
license: CC-BY-SA-4.0
size_categories:
- 100万 < 样本数 < 1000万
task_categories:
- 其他
tags:
- 地球观测(earth-observation)
- 遥感(remote-sensing)
- 高程(elevation)
- 卫星(satellite)
- 地理空间(geospatial)
dataset_info:
- config_name: 默认
features:
- name: 网格单元(grid_cell)
dtype: 字符串
- name: 缩略图(thumbnail)
dtype: 图像
- name: 压缩图像(compressed)
dtype: 图像
- name: 数字高程模型(DEM)
dtype: 二进制
configs:
- config_name: 默认
data_files: images/*.parquet
- config_name: 元数据
data_files: metadata.parquet

# Major TOM Core-DEM
Major TOM Core-DEM 包含全覆盖的哥白尼数字高程模型(Copernicus DEM),每个样本尺寸为356×356像素。
本数据集专为支持MESA地形生成模型的开发而构建,同时被收录于论文《EarthEmbeddingExplorer:面向全球卫星图像跨模态检索的Web应用》,并隶属于「Major TOM:面向地球观测的可扩展数据集」生态体系。
- **官方查看器应用**:[Major TOM Viewer](https://huggingface.co/spaces/Major-TOM/MajorTOM-Core-Viewer)
- **Major TOM GitHub仓库**:[ESA-PhiLab/Major-TOM](https://github.com/ESA-PhiLab/Major-TOM)
- **EarthEmbeddingExplorer**:[ModelScope应用](https://modelscope.ai/studios/Major-TOM/EarthEmbeddingExplorer)
| 数据源 | 模态类型 | 补丁数量 | 补丁尺寸 | 总像素数 |
|:-------|:--------:|:-------:|:--------:|:--------:|
| Copernicus DEM 30 | 数字地表模型(Digital Surface Model,简称DSM) | 1,837,843 | 356×356(30米) | 超16.54亿 |
## 数据集内容
| 数据列 | 详情说明 | 分辨率 |
|:-------|:--------|:-------|
| DEM(数字高程模型) | 原始数据 | 30米 |
| 缩略图 | 压缩后的山体阴影可视化结果 | 30米 |
| 压缩图像 | 原始数据的压缩PNG格式 | 30米 |
## 空间覆盖范围
本数据集为全球单时相数据集,几乎涵盖了全部COP-DEM数据集。
下图展示了本数据集的空间覆盖范围(黑色像素区域为缺失数据):

在首个版本中,本数据集纳入了所有可获取的DEM数据,但排除了北纬89度以南、国际日期变更线以西2度范围内的Major TOM网格单元,同时未纳入阿塞拜疆与亚美尼亚区域,原因是构建本数据集所用的Creodias平台无法获取该区域的数据。
## 示例用法
可在https://github.com/ESA-PhiLab/Major-TOM 获取接口脚本。
以下为通过HTTP直接从HuggingFace读取数据的示例:
python
from fsspec.parquet import open_parquet_file
import pyarrow.parquet as pq
from rasterio.io import MemoryFile
from PIL import Image
PARQUET_FILE = 'part_00390' # parquet编号
ROW_INDEX = 42 # 行号(每个parquet约含500行)
url = "https://huggingface.co/datasets/Major-TOM/Core-DEM/resolve/main/images/{}.parquet".format(PARQUET_FILE)
with open_parquet_file(url,columns = ["DEM"]) as f:
with pq.ParquetFile(f) as pf:
first_row_group = pf.read_row_group(ROW_INDEX, columns=['DEM'])
with MemoryFile(first_row_group['DEM'][0].as_py()) as mem_f:
with mem_f.open(driver='GTiff') as f:
dem = f.read()
以下为读取缩略图的示例:
python
from fsspec.parquet import open_parquet_file
import pyarrow.parquet as pq
from io import BytesIO
from PIL import Image
PARQUET_FILE = 'part_00390' # parquet编号
ROW_INDEX = 42 # 行号(每个parquet约含500行)
url = "https://huggingface.co/datasets/Major-TOM/Core-DEM/resolve/main/images/{}.parquet".format(PARQUET_FILE)
with open_parquet_file(url,columns = ["thumbnail"]) as f:
with pq.ParquetFile(f) as pf:
first_row_group = pf.read_row_group(ROW_INDEX, columns=['thumbnail'])
stream = BytesIO(first_row_group['thumbnail'][0].as_py())
image = Image.open(stream)
### 重投影细节
与[S1 RTC](huggingface.co/datasets/Major-TOM/Core-S1RTC)及S2系列([L1C](huggingface.co/datasets/Major-TOM/Core-S2L1C)与[L2A](huggingface.co/datasets/Major-TOM/Core-S2L2A))产品保留原生投影以构建对应Major TOM核心数据集的方式不同,原生坐标系为EPSG:4326的哥白尼DEM数据被重新投影至精心选定的坐标系。为确保Major TOM各数据源间的一致性,数据被重投影至对应网格单元的UTM分区(通用横轴墨卡托分区,Universal Transverse Mercator,UTM)。这导致在部分场景下,Sentinel-2与COP-DEM的网格单元存在投影不一致的问题。据估算,在同时拥有COP-DEM与S2-L2A数据的1,679,898个网格单元中,约2.5%(即41,998个)存在该问题。
大型DEM瓦片先通过双线性插值重采样至30米分辨率,随后根据需要使用最近邻插值裁剪为小型Major TOM网格单元。部分水域上方及阿塞拜疆、亚美尼亚周边的瓦片可能存在缺失像素,其像素值被设为-32767。

### 致谢
本数据集由欧洲空间局(ESA)Φ实验室与[法国巴黎Adobe研究院](https://research.adobe.com/careers/paris/)合作开发。数据集由Paul Borne--Pons在ESA Φ实验室访学期间完成,由Major TOM项目原作者Mikolaj Czerkawski与Alistair Francis指导。本次合作旨在探索Sentinel-2产品与DEM数据之间的协同应用,尤其聚焦于地形生成场景。
### 引用格式
latex
@inproceedings{mesa2025,
title={MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data},
author={Paul Borne--Pons and Mikolaj Czerkawski and Rosalie Martin and Romain Rouffet},
year={2025},
booktitle={MORSE Workshop at CVPR 2025},
eprint={2504.07210},
url={https://arxiv.org/abs/2504.07210}}
本数据集基于哥白尼WorldDEM-30制作,© 德国宇航中心(DLR e.V.)2010-2014、© 空客防务与航天公司(Airbus Defence and Space GmbH)2014-2018,由欧盟与欧洲空间局依据哥白尼计划提供;保留所有权利。
提供机构:
WUqiuping



