CMAB-The World's First National-Scale Multi-Attribute Building Dataset

Name: CMAB-The World's First National-Scale Multi-Attribute Building Dataset
Creator: figshare
Published: 2025-06-01 05:46:41
License: 暂无描述

DataCite Commons2025-06-01 更新2025-01-06 收录

下载链接：

https://figshare.com/articles/dataset/CMAB-The_World_s_First_National-Scale_Multi-Attribute_Building_Dataset/27992417/1

下载链接

链接失效反馈

官方服务：

资源简介：

Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper introduces a geospatial artificial intelligence (GeoAI) framework for large-scale building modeling, presenting the first national-scale Multi-Attribute Building dataset (CMAB), covering 3,667 spatial cities, 32million buildings, and 23.6billion m²of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 368billion m³of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating features such as morphology, location, and function. Using multi-source data, including billions of high-resolution Google Earth images and 60 million street view images (SVIs), we generated rooftop, height, structure, function, style, age, and quality attributes for each building with machine learning and large multimodal models. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning.Reference format：Zhang, Y., Zhao, H., Long, Y. “CMAB-The World's First National-Scale Multi-Attribute Building Dataset”, 10.6084/m9.figshare.27992417.Data records: A building dataset with a total rooftop area of 23.6 billion square meters in 3,667 natural cities in China, including the attribute of building rooftop, height, structure, function, age, style, colour and quality, as well as the code files used to calculate these data. The deep learning models used are OCRNet, XGBoost, fine-tuned CLIP and Yolo-v8.Data storage: Due to the large size of the dataset, we provide two versions, one is before post-treatment, which is the original building data for slicing to extract the rooftops of the buildings, including various calculated features and predicted building multi-attributes (see https://doi.org/10.17632/b4t2wxhn2y/2<sup> </sup>for details (CMAB_origin.rar)). The other version is after post-treatment (the building has a complete rooftop and will not be divided by slices), which contains various attributes of buildings we derived (see 10.6084/m9.figshare.27992417 for details (CMAB_postprocess.rar)). In each province, there are shp files of all spatial cities belonging to the province, and the number in the middle is the spatial city number, and the spatial city numbers of all provinces range from 0 to 3666. After the shp file of each spatial city is opened, each line of records is a building instance.Contact: zhangyec23@mails.tsinghua.edu.cn ; Wechat:18013979786<br>

提供机构：

figshare

创建时间：

2024-12-26

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集