five

"Refined Massachusetts Building Dataset"

收藏
DataCite Commons2025-05-09 更新2025-05-17 收录
下载链接:
https://ieee-dataport.org/documents/refined-massachusetts-building-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
"The Massachusetts dataset, created using vector data from the OpenStreetMap (OSM) platform, was observed to contain various types of labeling errors. Since the OSM data are continuously updated by volunteer contributors, manual data entry may bring the risk of inconsistency and inaccuracy [20]. Also, the resolution of the images exacerbates labeling errors by contributing to problems such as blurred building boundaries [21]. These errors were carefully analyzed and categorized into six main groups: (1) mislabeling, (2) inclusion of non-building elements, (3) false positive estimates, (4) missing labels, (5) spatial misalignment, and (6) object contamination (Fig. 5). It should be noted that the red lines in the figure show the building boundaries in the dataset. A thorough search and updating process were conducted to resolve all these problems. All images were cropped into 512\u00d7512 patches with half overlap, which resulted in the appearance of white (empty) regions. To prevent these regions affecting the training process, the image patches containing white pixels and their corresponding labels were removed. Eventually, the dataset consisted of 1,495 training, 320 testing, and 320 validation images. "

马萨诸塞州数据集(Massachusetts dataset)依托开放街道地图(OpenStreetMap,OSM)平台的矢量数据构建,经观测发现其存在多种类型的标注误差。由于OSM数据由志愿者贡献者持续更新,手动数据录入可能带来不一致性与不准确风险[20]。此外,图像分辨率会加剧标注误差,例如引发建筑边界模糊等问题[21]。研究人员对这些误差进行了细致分析,并将其划分为六大主要类别:(1) 标注错误,(2) 非建筑元素混入,(3) 假阳性预估,(4) 标签缺失,(5) 空间错位,(6) 目标污染(如图5所示)。需特别说明的是,图中的红色线条代表该数据集中的建筑边界。研究团队开展了全面的排查与更新流程,以解决上述所有问题。所有图像均被裁剪为512×512的图像块,且采用半重叠的裁剪方式,这导致了白色(空白)区域的出现。为避免这些区域对训练过程造成干扰,所有包含白色像素的图像块及其对应的标注均被移除。最终,该数据集共包含1495张训练图像、320张测试图像以及320张验证图像。
提供机构:
IEEE DataPort
创建时间:
2025-05-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作