five

Dataset for "Spectroscopic Transformer for Improved EMIT Cloud Masks"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14607937
下载链接
链接失效反馈
官方服务:
资源简介:
Spectroscopic Transformer for Improved EMIT Cloud Masks Summary Manuscript in preparation/submitted. This repository contains the dataset used to train and evaluate the Spectroscopic Transformer model for EMIT cloud screening.  spectf_cloud_labelbox.hdf5 1,841,641 Labeled spectra from 221 EMIT Scenes. spectf_cloud_mmgis.hdf5 1,733,801 Labeled spectra from 313 EMIT Scenes. These scenes were speciffically labeled to correct false detections by an earlier version of the model. train_fids.csv 465 EMIT scenes comprising the training set. test_fids.csv 69 EMIT scenes comprising the held-out validation set. Data Description 221 EMIT Scenes were initially selected for labeling with diversity in mind. After sparse segmentation labeling of confident regions in Labelbox, up to 10,000 spectra were selected per-class per-scene to form the spectf_cloud_labelbox dataset. We deployed a preliminary model trained on these spectra on all EMIT scenes observed in March 2024, then labeled another 313 EMIT Scenes using MMGIS's polygonal labeling tool to correct false positive and false negative detections. After similarly sampling spectra from these scenes, A total of 3,575,442 spectra were labeled and sampled. The train/test split was randomly determined by scene FID to prevent the same EMIT scene from contributing spectra to both the training and validation datasets. Please refer to Section 4.2 in the paper for a complete description, and to our code repository for example usage and a Pytorch dataloader. Each hdf5 file contains the following arrays: 'spectra' Top-of-Atmosphere reflectance calculated from the EMIT L1B Radiance product Float64 of shape (n, 268)  'fids' The FID from which each spectrum was sampled Binary string of shape (n,) 'indices' The (col, row) index from which each spectrum was sampled Int64 of shape (n, 2) 'labels' Annotation label of each spectrum 0 - "Clear" 1 - "Cloud" 2 - "Cloud Shadow" (Only for the Labelbox dataset, and this class was combined with the clear class for this work. See paper for details.) label[label==2] = 0 Int64 of shape (n,2) Each hdf5 file contains the following attribute: 'bands' The band center wavelengths (nm) of the spectrum Float64 of shape (268,) Acknowledgements The EMIT online mapping tool was developed by the JPL MMGIS team. The High Performance Computing resources used in this investigation were provided by funding from the JPL Information and Technology Solutions Directorate. This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004). © 2024 California Institute of Technology. Government sponsorship acknowledged.
创建时间:
2025-01-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作