Two residential districts datasets from Kielce, Poland for building semantic segmentation task
收藏Mendeley Data2024-01-31 更新2024-06-30 收录
下载链接:
https://www.scidb.cn/en/detail?dataSetId=dea6996c37da431ab88f6d73641177c4
下载链接
链接失效反馈官方服务:
资源简介:
Today, deep neural networks are widely used in many computer vision problems, also for geographic information systems (GIS) data. This type of data is commonly used for urban analyzes and spatial planning. We used orthophotographic images of two residential districts from Kielce, Poland for research including urban sprawl automatic analysis with Transformer-based neural network application.Orthophotomaps were obtained from Kielce GIS portal. Then, the map was manually masked into building and building surroundings classes. Finally, the ortophotomap and corresponding classification mask were simultaneously divided into small tiles. This approach is common in image data preprocessing for machine learning algorithms learning phase. Data contains two original orthophotomaps from Wietrznia and Pod Telegrafem residential districts with corresponding masks and also their tiled version, ready to provide as a training data for machine learning models.Transformed-based neural network has undergone a training process on the Wietrznia dataset, targeted for semantic segmentation of the tiles into buildings and surroundings classes. After that, inference of the models was used to test model's generalization ability on the Pod Telegrafem dataset. The efficiency of the model was satisfying, so it can be used in automatic semantic building segmentation. Then, the process of dividing the images can be reversed and complete classification mask retrieved. This mask can be used for area of the buildings calculations and urban sprawl monitoring, if the research would be repeated for GIS data from wider time horizon.Since the dataset was collected from Kielce GIS portal, as the part of the Polish Main Office of Geodesy and Cartography data resource, it may be used only for non-profit and non-commertial purposes, in private or scientific applications, under the law "Ustawa z dnia 4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (Dz.U. z 2006 r. nr 90 poz 631 z późn. zm.)". There are no other legal or ethical considerations in reuse potential.Data information is presented below.wietrznia_2019.jpg - orthophotomap of Wietrznia districtmodel's - used for training, as an explanatory imagewietrznia_2019.png - classification mask of Wietrznia district - used for model's training, as a target imagewietrznia_2019_validation.jpg - one image from Wietrznia district - used for model's validation during training phasepod_telegrafem_2019.jpg - orthophotomap of Pod Telegrafem district - used for model's evaluation after training phasewietrznia_2019 - folder with wietrznia_2019.jpg (image) and wietrznia_2019.png (annotation) images, divided into 810 tiles (512 x 512 pixels each), tiles with no information were manually removed, so the training data would contain only informative tilestiles presented - used for the model during training (images and annotations for fitting the model to the data)wietrznia_2019_vaidation - folder with wietrznia_2019_validation.jpg image divided into 16 tiles (256 x 256 pixels each) - tiles were presented to the model during training (images for validation model's efficiency); it was not the part of the training datapod_telegrafem_2019 - folder with pod_telegrafem.jpg image divided into 196 tiles (256 x 265 pixels each) - tiles were presented to the model during inference (images for evaluation model's robustness)Dataset was created as described below.Firstly, the orthophotomaps were collected from Kielce Geoportal (https://gis.kielce.eu). Kielce Geoportal offers a .pst recent map from April 2019. It is an orthophotomap with a resolution of 5 x 5 pixels, constructed from a plane flight at 700 meters over ground height, taken with a camera for vertical photos. Downloading was done by WMS in open-source QGIS software (https://www.qgis.org), as a 1:500 scale map, then converted to a 1200 dpi PNG image.Secondly, the map from Wietrznia residential district was manually labelled, also in QGIS, in the same scope, as the orthophotomap. Annotation based on land cover map information was also obtained from Kielce Geoportal. There are two classes - residential building and surrounding. Second map, from Pod Telegrafem district was not annotated, since it was used in the testing phase and imitates situation, where there is no annotation for the new data presented to the model.Next, the images was converted to an RGB JPG images, and the annotation map was converted to 8-bit GRAY PNG image.Finally, Wietrznia data files were tiled to 512 x 512 pixels tiles, in Python PIL library. Tiles with no information or a relatively small amount of information (only white background or mostly white background) were manually removed. So, from the 29113 x 15938 pixels orthophotomap, only 810 tiles with corresponding annotations were left, ready to train the machine learning model for the semantic segmentation task. Pod Telegrafem orthophotomap was tiled with no manual removing, so from the 7168 x 7168 pixels ortophotomap were created 197 tiles with 256 x 256 pixels resolution. There was also image of one residential building, used for model's validation during training phase, it was not the part of the training data, but was a part of Wietrznia residential area. It was 2048 x 2048 pixel ortophotomap, tiled to 16 tiles 256 x 265 pixels each.
创建时间:
2024-01-31



