CMAB-The World's First National-Scale Multi-Attribute Building Dataset

Name: CMAB-The World's First National-Scale Multi-Attribute Building Dataset
Creator: figshare
Published: 2025-04-20 01:20:50
License: 暂无描述

DataCite Commons2025-04-20 更新2025-01-06 收录

下载链接：

https://figshare.com/articles/dataset/CMAB-The_World_s_First_National-Scale_Multi-Attribute_Building_Dataset/27992417

下载链接

链接失效反馈

官方服务：

资源简介：

Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper presents the first national-scale Multi-Attribute Building dataset (CMAB) with artificial intelligence, covering 3,667 spatial cities, 31 million buildings, and 23.6 billion m² of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 363 billion m³ of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating morphology, location, and function features. Using multi-source data, including billions of remote sensing images and 60 million street view images (SVIs), we generated rooftop, height, structure, function, style, age, and quality attributes for each building with machine learning and large multimodal models. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning.Data records: A building dataset with a total rooftop area of 23.6 billion square meters in 3,667 natural cities in China, including the attribute of building rooftop, height, structure, function, age, style and quality, as well as the code files used to calculate these data. The deep learning models used are OCRNet, XGBoost, fine-tuned CLIP and Yolo-v8.Supplementary note: The architectural structure, style, and quality are affected by the temporal and spatial distribution of street views in China. Regarding the recognition of building colors, we found that the existing CLIP series model can not accurately judge the composition and proportion of building colors, and then it will be accurately calculated and supplemented by semantic segmentation and image processing. Please contact zhangyec23@mails.tsinghua.edu.cn or ylong@tsinghua.edu.cn if you have any technical problems.Reference Format: Zhang, Y., Zhao, H. & Long, Y. CMAB: A Multi-Attribute Building Dataset of China. Sci Data 12, 430 (2025). https://doi.org/10.1038/s41597-025-04730-5.

快速获取包含屋顶、高度、朝向等几何属性，以及功能、质量、建成年代等表征属性的三维（3D）建筑数据，对于精准的城市分析、模拟与政策更新至关重要。当前建筑数据集普遍存在建筑多属性覆盖不全的问题。本文提出了首个基于人工智能构建的国家级多属性建筑数据集（CMAB），覆盖中国3667个自然城市、3100万栋建筑，屋顶总面积达236亿平方米，基于OCRNet的提取任务F1分数达89.93%，建筑总存量达3630亿立方米。我们结合城市行政分类，训练了引入形态、区位与功能特征的bootstrap聚合XGBoost模型。我们依托机器学习与多模态大模型，基于数十亿幅遥感影像、6000万张街景图像（SVIs）等多源数据，为每栋建筑生成了屋顶、高度、结构、功能、风格、建成年代与质量等属性。通过模型基准测试、现有同类产品及人工街景图像验证，模型精度大多可达80%以上。本数据集与研究成果对于全球可持续发展目标（SDGs）的实现与城市规划工作至关重要。数据说明：本建筑数据集覆盖中国3667个自然城市，总屋顶面积达236亿平方米，包含每栋建筑的屋顶、高度、结构、功能、建成年代、风格与质量等属性，同时附带用于计算这些属性的代码文件。所用深度学习模型包括OCRNet、XGBoost、微调版CLIP与Yolo-v8。补充说明：建筑结构、风格与质量属性受中国街景图像的时空分布影响。针对建筑色彩识别任务，我们发现现有CLIP系列模型无法准确判断建筑色彩的构成与占比，后续将通过语义分割与图像处理技术进行精准计算与补充。若存在技术问题，请联系zhangyec23@mails.tsinghua.edu.cn或ylong@tsinghua.edu.cn。引用格式：Zhang, Y., Zhao, H. 与 Long, Y. CMAB：中国多属性建筑数据集. *Scientific Data* 12, 430 (2025). https://doi.org/10.1038/s41597-025-04730-5.

提供机构：

figshare

创建时间：

2024-12-09

搜集汇总

数据集介绍