five

DGGS Benchmark Replication Study - Results Dataset

收藏
Zenodo2026-03-07 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18904135
下载链接
链接失效反馈
官方服务:
资源简介:
Results Dataset for the DGGS Benchmark Replication Study  Description This dataset contains the results of a reproducible replication study of the benchmarks presented in: Law, R.M. & Ardo, J. (2024). "Using a discrete global grid system for a scalable, interoperable, and reproducible system of land-use mapping." Big Earth Data, 9(1), 29-46. DOI: 10.1080/20964471.2024.2429847 The replication validates the paper's two central claims: Vector benchmark: DGGS provides "orders of magnitude" performance improvement over traditional vector overlay operations Raster benchmark: DGGS and raster methods show "roughly equivalent performance" for classification tasks What's New in Version 3.0.0 Version 3.0.0 extends the replication to include HEALPix benchmarks using the healpix-geo library (v0.0.11), which supports both sphere and WGS84 ellipsoid reference surfaces. This is the first version to provide a cross-DGGS unified comparison: H3 vs HEALPix/sphere vs HEALPix/WGS84. Key new finding: The choice of reference surface (sphere vs WGS84 ellipsoid) has a negligible effect on performance but a large effect on cell assignment accuracy. At mid-latitudes (+48°, e.g. Mediterranean) and high latitudes (+62°, e.g. Scandinavia), 98% and 91% of pixels respectively are assigned to different HEALPix cells depending on whether a sphere or WGS84 ellipsoid is used. This is directly relevant to European EO data (Sentinel/Copernicus, 45–65°N) and provides a strong scientific argument for using geodetically-correct indexing in production workflows. Files Included Version 2.0.0 files (H3 / xdggs replication) File Description vector_benchmark.csv Timing results for vector overlay vs H3 DGGS comparison raster_benchmark.csv Timing results for raster vs H3 DGGS comparison indexing_benchmark.json Comparison of H3 loop vs xdggs vectorized indexing system_info.json Hardware/software environment details for reproducibility summary.json Structured summary of results and validation status benchmark_unified.png Visualization of all benchmark results (PNG format) benchmark_unified.pdf Visualization of all benchmark results (PDF format) Version 3.0.0 files (HEALPix / healpix-geo extension) File Description vector_benchmark_healpix_geo.csv Timing results for vector overlay vs HEALPix (sphere and WGS84) raster_benchmark_healpix_geo.csv Timing results for raster vs HEALPix (sphere and WGS84) ellipsoid_analysis.json Pixel-assignment difference between sphere and WGS84 by latitude band comparison_table.csv Unified cross-DGGS comparison table (H3, HEALPix/sphere, HEALPix/WGS84) comparison_summary.json Structured summary of cross-DGGS comparison results comparison.png Cross-DGGS comparison visualization (PNG format) comparison.pdf Cross-DGGS comparison visualization (PDF format) run_healpix_geo_replication.py Benchmark script for HEALPix/sphere and HEALPix/WGS84 run_comparison.py Cross-DGGS unified comparison script (reads all result CSVs) Benchmark Configuration Vector Benchmark (Figure 6 replication) Layers tested: 5, 10, 20, 50 H3 resolution: 14 (matching paper) HEALPix depth: 9 Method: Voronoi polygons with random point distribution, dissolved by binary value, then overlaid Raster Benchmark (Figure 7 replication) Layers tested: 10, 50, 100, 500, 1,000, 5,000, 10,000 H3 / HEALPix resolution: 9 Raster size: 100 × 100 pixels per layer Method: Neutral Landscape Model (mid-point displacement) with Gaussian smoothing Classification Logic Following the paper's methodology, classification uses seven mathematical functions applied to summed layer values: Prime number test Perfect number test Triangular number test Square number test Pentagonal number test Hexagonal number test Fibonacci number test The combination of these seven binary outputs produces up to 127 distinct classes. Key Results Vector Benchmark (all methods validate the paper's claim) Layers DGGS Time (s) Vector Time (s) DGGS Speedup 5 ~0.02 ~0.4 22× 10 ~0.03 ~2.5 105× 20 ~0.05 ~27 541× 50 ~0.13 ~780 5,999× Conclusion: DGGS is orders of magnitude faster than vector overlay. The speedup increases with layer count because vector overlay creates exponentially more sliver polygons, while DGGS cell count remains fixed. Cross-DGGS Comparison (new in v3.0.0) Method Max speedup vs vector Crossover point H3 (sphere) ~5,800× ~5 layers HEALPix / sphere ~5,691× ~5 layers HEALPix / WGS84 ~5,603× ~5 layers All three implementations validate the paper's claim. The near-identical speedups and crossover points confirm that the performance advantage is a property of the DGGS paradigm itself (join-on-cell-ID), not of any specific implementation. Sphere vs WGS84 Ellipsoid Indexing Difference (new in v3.0.0) Region Center latitude Pixels in different cell Jaccard similarity Equatorial 0° 27% 0.9951 Mid-latitude (Mediterranean) +48° 98% 0.9843 High-latitude (Scandinavia) +62° 91% 0.9868 Arctic +78° 53% 0.9908 Conclusion: For European EO data (Copernicus/Sentinel, 45–65°N), sphere-based HEALPix indexing assigns almost every pixel to the wrong cell. WGS84 indexing via healpix-geo is strongly recommended for production workflows. Raster Benchmark DGGS pre-indexed vs Raster classification: Roughly equivalent performance (within 2–3×) xdggs vectorized indexing: Significantly faster than H3 loop-based indexing for coordinate-to-cell conversion Conclusion: For pre-indexed data, DGGS classification performance matches raster, validating the paper's claim. Understanding the Raster Benchmark Plot The raster benchmark plot shows four methods: Line Method What it measures 🟠 Raster (baseline) NumPy array operations Traditional raster stacking and classification 🟦 DGGS+H3 (reproduction) H3 loop indexing Paper's original approach: index each layer with H3, then classify 🟣 DGGS+xdggs (replication) xdggs vectorized indexing Alternative approach: index with xdggs, then classify 🟢 DGGS pre-indexed Read from Parquet Paper's target scenario: data already indexed to DGGS Key interpretation: The DGGS pre-indexed line (green) represents the paper's main use case: data is indexed to DGGS once, then queried many times The Classification Only subplot (bottom-right) isolates this comparison, showing DGGS and raster are roughly equivalent The gap between DGGS+H3 and DGGS+xdggs demonstrates the indexing speedup from vectorization Methodology What is a DGGS? A Discrete Global Grid System (DGGS) is a spatial reference system that partitions the Earth's surface into a hierarchical sequence of equal-area cells. Unlike traditional coordinate systems, DGGS provides: Fixed discretization: Space is divided into a finite number of cells at each resolution level Hierarchical structure: Cells nest within parent cells, enabling multi-resolution analysis Unique cell identifiers: Each cell has a unique ID that implicitly encodes its location This study uses H3 (Uber's hexagonal hierarchical spatial index) and HEALPix (Hierarchical Equal Area isoLatitude Pixelization), widely used in astrophysics and increasingly adopted in Earth observation. Key insight: When data is indexed to a DGGS, spatial joins become simple attribute joins on cell IDs, avoiding expensive geometric intersection computations. Reproduction vs Replication Following established terminology in reproducibility research: Term Definition Implementation in this study Reproduction Same methodology, same tools H3 library + Polars (matching the paper's approach) Replication Same methodology, alternative tools xdggs for vectorized H3 indexing (v2.0.0); healpix-geo for HEALPix sphere+WGS84 (v3.0.0) The Role of xdggs in Replication (v2.0.0) xdggs is a Python library that provides Xarray extensions for DGGS operations. It offers an alternative implementation for converting geographic coordinates to DGGS cell IDs. Performance comparison: Method Time per layer Relative speed H3 loop ~0.15s 1× (baseline) xdggs vectorized ~0.001s ~150× faster The Role of healpix-geo in Replication (v3.0.0, new) healpix-geo is a Python library (built on the cdshealpix Rust crate) that provides HEALPix indexing on both spherical and ellipsoidal (WGS84, GRS80) reference surfaces. Unlike cdshealpix or astropy, it requires no astronomy dependencies and natively supports geodetically-correct indexing. This is the first replication study to test HEALPix with a proper WGS84 ellipsoid, motivated by the observation that Copernicus/Sentinel data is acquired predominantly over Europe (45–65°N) where the WGS84 flattening correction is largest. Software Environment Python 3.11 H3 v4.x (Uber's hexagonal hierarchical spatial index) xdggs (vectorized DGGS operations) healpix-geo v0.0.11 (HEALPix with WGS84 ellipsoid support) (new in v3.0.0) GeoPandas, Rasterio, NumPy, Pandas, Polars SciPy (Voronoi tessellation, Gaussian filtering) How to Reproduce The complete replication environment is available at: https://github.com/annefou/dggs_replication_2026 Using Docker (Recommended) docker pull ghcr.io/annefou/dggs_replication_2026:latest docker run -v $(pwd)/results:/app/results ghcr.io/annefou/dggs_replication_2026:latest Using Python git clone https://github.com/annefou/dggs_replication_2026.git cd dggs_replication_2026 pip install -r requirements.txt # H3 replication (v2.0.0) python run_replication.py --all --output results_h3 # HEALPix/healpix-geo replication (v3.0.0, new) python run_healpix_geo_replication.py --all --output results_healpix_geo # Cross-DGGS comparison (v3.0.0, new) python run_comparison.py \ --h3 results_h3 \ --healpix-geo results_healpix_geo \ --output results_comparison Citation If you use this dataset, please cite both the original paper and this replication: Original Paper @article{law2024dggs, title={Using a discrete global grid system for a scalable, interoperable, and reproducible system of land-use mapping}, author={Law, Richard M. and Ardo, James}, journal={Big Earth Data}, volume={9}, number={1}, pages={29--46}, year={2024}, publisher={Taylor \& Francis}, doi={10.1080/20964471.2024.2429847} } This Replication Dataset (v3.0.0) @dataset{fouilloux2026dggs_replication, author = {Fouilloux, Anne}, title = {{DGGS Benchmark Replication Study: Results Dataset}}, year = {2026}, publisher = {Zenodo}, version = {3.0.0}, doi = {10.5281/zenodo.18343025}, url = {https://doi.org/10.5281/zenodo.18343025} } Original Benchmark Code The original benchmark code from the paper is available at: Repository: https://github.com/manaakiwhenua/dggsBenchmarks Version used: v1.1.1 License This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Author Anne Fouilloux ORCID: 0000-0002-1784-2920 Affiliation: LifeWatch ERIC Acknowledgments Richard M. Law and James Ardo (Manaaki Whenua – Landcare Research) for the original research Uber Technologies for the H3 library The xdggs development team for vectorized DGGS operations The healpix-geo development team for WGS84-aware HEALPix indexing (new in v3.0.0) Related Resources DGGS and H3 H3 Documentation: https://h3geo.org/ H3 Python API: https://uber.github.io/h3-py/ H3 Resolution Table: https://h3geo.org/docs/core-library/restable/ OGC DGGS Standard: https://www.ogc.org/standard/dggs/ HEALPix and healpix-geo healpix-geo Documentation: https://healpix-geo.readthedocs.io/ (new in v3.0.0) healpix-geo PyPI: https://pypi.org/project/healpix-geo/ (new in v3.0.0) cdshealpix (Rust backend): https://github.com/cds-astro/cds-healpix-python (new in v3.0.0) xdggs and Related Tools xdggs Documentation: https://xdggs.readthedocs.io/ xdggs GitHub: https://github.com/xarray-contrib/xdggs h3ronpy (used by xdggs): https://github.com/nmandery/h3ronpy Original Research Original paper: https://doi.org/10.1080/20964471.2024.2429847 Original benchmark code: https://github.com/manaakiwhenua/dggsBenchmarks vector2dggs tool: https://github.com/manaakiwhenua/vector2dggs raster2dggs tool: https://github.com/manaakiwhenua/raster2dggs Reproducibility Resources FORRT Replication Handbook: https://forrt.org/replication_handbook/ The Turing Way - Reproducibility: https://the-turing-way.netlify.app/reproducible-research/ Dataset generated: March 2026 Replication framework version: 3.0.0
提供机构:
Zenodo
创建时间:
2026-03-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作