five

Connecticut Solar PV Semantic Segmentation Dataset

收藏
DataCite Commons2022-01-30 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/Connecticut_Solar_PV_Semantic_Segmentation_Dataset/18982199
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Background Information</b><br>Annotated overhead imagery dataset for the paper "<i>SolarMapper: estimating solar array location, size, and capacity using deep learning and overhead imagery</i>".<br>The paper's Github page: energydatalab/mrs: Models for Remote Sensing (github.com)<br><br>This dataset is a subset of the very high resolution aerial imagery provided through the <i>Connecticut Department of Energy and Environmental Protection shared via the University of Connecticut (CT ECO 2016 Imagery &amp; Elevation (uconn.edu)</i>). The original resolution of the imagery is 3 inch (7.62 cm) and we downsampled all of the imagery to a resolution of 30 cm, which is the same resolution as most high resolution satellite imagery.<br>87 image tiles in total are included in this dataset. We manually annotated all visible solar PV panels with polygons for semantic segmentation purposes. We split the dataset into a training and a validation set at a 2:1 ratio. Details of each subset are shown below:<br><b>Training set</b>: 57 image tiles, 33.12 km<sup>2</sup> of ground area, 608 solar PV arrays, 0.058 km<sup>2</sup> of solar PV panel area.<br><b>Validation set</b>: 30 image tiles, 17.43 km<sup>2</sup> of ground area, 1,003 solar PV arrays, 0.091 km<sup>2</sup> of solar PV panel area.<br><br>Additionally, we compared our sampled imagery with <i>Microsoft's US Building Footprints dataset (microsoft/USBuildingFootprints: Computer generated building footprints for the United States (github.com))</i> to split all image tiles into 3 brackets (high, medium, and low building density) by the number of buildings with in an image tile. More details about how we assigned these image tiles could be found in the paper. Such assignment was only applied to the 30 validation tiles.<br><b>File Description</b><b><br></b><b>ct_images_and_labels.zip</b><b><br></b>This <i>.zip</i> file contains all aerial image (<i>.jpg</i>) files and corresponding semantic segmentation annotation mask (<i>.png</i>) files. All .jpg imagery files are 8-bit RGB images and all <i>.png</i> files are binary arrays where 1 is for solar PV pixels and 0 is for non-solar PV pixels. An aerial imagery file and its corresponding mask file share the same filename but have different filename extensions. E.g., <i>000795_sw.jpg</i> is the aerial imagery file for tile <i>000795_sw</i>, and <i>000795_sw.png</i> is the corresponding mask file.<br><b>tile_assignments.csv</b><b><br></b>This .csv file records if an image tile is in the training or validation set, and if it's a high, medium or low building density tile. There are 3 columns in the <i>.csv</i> file:<b>- tile_name (String)</b>: The tile name ID of a tile. This corresponds to filanmes of <i>.jp</i>g aerial imagery files and <i>.png</i> annotation mask files.<b>- training_split (</b><b>Categorical</b><b>)</b>: Whether the image tile is in the training or the validation set. Values are either 'train' for the training set or "valid" for the validation set.<b>- density_split (Categorical)</b>: Which building density bracket does the image tile belong to. Since the building density based assignment was only applied to the 30 validation tiles, all training tiles have "NA" in this column. For the validation tiles, values are either "high", "med", or "low" corresponding to the 3 building density brackets (high, medium and low).<br><b>ct_labeled_tile_boundaries_gcs.geojson</b><br><b><br></b>This .geojson file records boundary polygons of the tiles in this dataset. Tile name information is stored in the "tile_name" field. Descriptions on other fields could be found at CRCOG Orthoimagery (uconn.edu).<br><b>sample.pptx</b><br>Samples from each of the other 3 files.

<b>背景信息</b><br>本数据集为论文《SolarMapper:基于深度学习与航拍影像估算光伏阵列位置、规模与容量》的配套标注航拍影像数据集。本论文的GitHub页面:energydatalab/mrs:遥感模型(Models for Remote Sensing),仓库地址为github.com<br><br>本数据集源自康涅狄格州能源与环境保护部(Connecticut Department of Energy and Environmental Protection)通过康涅狄格大学共享的CT ECO 2016超高分辨率航拍影像与高程数据(uconn.edu),仅为其中的子集。原始影像分辨率为3英寸(7.62厘米),本数据集将所有影像重采样至30厘米分辨率,该分辨率与多数高分辨率卫星影像一致。<br>本数据集共包含87张影像瓦片。我们针对语义分割任务,以多边形手动标注了所有可见的光伏面板。数据集按照2:1的比例划分为训练集与验证集,各子集详情如下:<br><b>训练集</b>:含57张影像瓦片,覆盖地面面积33.12平方千米,包含608个光伏阵列,光伏面板总面积0.058平方千米。<br><b>验证集</b>:含30张影像瓦片,覆盖地面面积17.43平方千米,包含1003个光伏阵列,光伏面板总面积0.091平方千米。<br><br>此外,我们将本数据集使用的采样影像与微软(Microsoft)的美国建筑足迹数据集(US Building Footprints dataset,仓库地址:microsoft/USBuildingFootprints:美国计算机生成建筑足迹,github.com)进行比对,依据单张影像瓦片内的建筑数量,将所有瓦片划分为高、中、低三类建筑密度等级。影像瓦片的密度分级规则详情可参见论文,且该分级仅适用于30张验证集瓦片。<br><b>文件说明</b><br><b>ct_images_and_labels.zip</b><br>该.zip格式压缩包包含所有航拍影像文件(.jpg格式)与对应的语义分割标注掩码文件(.png格式)。所有.jpg影像文件均为8位RGB图像,.png文件则为二值数组:像素值1代表光伏面板区域,0代表非光伏面板区域。航拍影像文件与其对应的掩码文件名完全一致,仅后缀名不同。例如,瓦片000795_sw的航拍影像文件为000795_sw.jpg,对应的掩码文件为000795_sw.png。<br><b>tile_assignments.csv</b><br>该.csv格式文件记录了每张影像瓦片所属的数据集划分(训练集或验证集),以及其建筑密度等级。该文件共包含3列:<br><b>- tile_name(字符串型)</b>:影像瓦片的名称ID,与.jpg格式航拍影像文件和.png格式标注掩码文件的文件名一一对应。<br><b>- training_split(分类变量)</b>:影像瓦片的数据集划分类型,取值为"train"(训练集)或"valid"(验证集)。<br><b>- density_split(分类变量)</b>:影像瓦片所属的建筑密度等级。由于建筑密度分级仅适用于30张验证集瓦片,因此所有训练集瓦片的该列取值均为"NA"。验证集瓦片的取值为"high"(高密度)、"med"(中密度)或"low"(低密度),对应三类建筑密度等级。<br><b>ct_labeled_tile_boundaries_gcs.geojson</b><br><br>该.geojson格式文件记录了本数据集所有影像瓦片的边界多边形,瓦片名称信息存储于"tile_name"字段中。其他字段的说明可参见CRCOG正射影像文档(uconn.edu)。<br><b>sample.pptx</b><br>包含其余3类文件的示例样例。
提供机构:
figshare
创建时间:
2022-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作