five

CMAB-The World's First National-Scale Multi-Attribute Building Dataset

收藏
DataCite Commons2025-06-01 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/CMAB-The_World_s_First_National-Scale_Multi-Attribute_Building_Dataset/27992417/4
下载链接
链接失效反馈
官方服务:
资源简介:
Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper presents the first national-scale Multi-Attribute Building dataset (CMAB) with artificial intelligence, covering 3,667 spatial cities, 31 million buildings, and 23.6 billion m² of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 363 billion m³ of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating morphology, location, and function features. Using multi-source data, including billions of remote sensing images and 60 million street view images (SVIs), we generated rooftop, height, structure, function, style, age, and quality attributes for each building with machine learning and large multimodal models. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning.Data records: A building dataset with a total rooftop area of 23.6 billion square meters in 3,667 natural cities in China, including the attribute of building rooftop, height, structure, function, age, style, colour and quality, as well as the code files used to calculate these data. The deep learning models used are OCRNet, XGBoost, fine-tuned CLIP and Yolo-v8.Reference Format:Zhang, Y., Zhao, H. & Long, Y. CMAB: A Multi-Attribute Building Dataset of China. Sci Data 12, 430 (2025). https://doi.org/10.1038/s41597-025-04730-5.

快速获取三维(3D)建筑数据(涵盖屋顶、高度、朝向等几何属性,以及功能、质量、建造年代等指示性属性),对于精准开展城市分析、模拟推演与政策更新至关重要。当前主流建筑数据集普遍存在多属性覆盖不全的痛点。本论文发布了首个全国尺度的多属性建筑数据集(CMAB,Multi-Attribute Building Dataset),融合人工智能技术,覆盖3667个空间城市、3100万栋建筑,以及236亿平方米的屋顶面积;基于OCRNet的提取任务F1分数达89.93%,建筑总存量体积总计3630亿立方米。 研究团队针对城市行政分类训练了bootstrap聚合XGBoost模型,融入形态特征、位置特征与功能特征。本研究依托多源数据(包括数十亿幅遥感影像与6000万张街景图像(Street View Images, SVI)),通过机器学习与多模态大模型(Large Multimodal Models)为每栋建筑生成屋顶、高度、结构、功能、风格、建造年代与质量等多维度属性。 本研究通过模型基准测试、现有同类产品比对以及人工街景图像核验完成精度验证,多数指标精度均高于80%。本数据集与研究成果对于全球可持续发展目标(Sustainable Development Goals, SDGs)的落地与城市规划工作具有关键支撑价值。 数据记录:本建筑数据集覆盖中国3667个自然城市,总屋顶面积达236亿平方米,包含建筑屋顶、高度、结构、功能、建造年代、风格、色彩与质量等属性,同时附带用于计算上述数据的代码文件。本次研究使用的深度学习模型包括OCRNet、XGBoost、微调后的CLIP以及Yolov8。 引用格式:Zhang, Y., Zhao, H. & Long, Y. CMAB: A Multi-Attribute Building Dataset of China. Sci Data 12, 430 (2025). https://doi.org/10.1038/s41597-025-04730-5.
提供机构:
figshare
创建时间:
2025-03-21
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
CMAB是全球首个国家级多属性建筑数据集,覆盖中国3667个城市的3100万栋建筑,包含屋顶、高度、功能等8类属性,采用OCRNet和XGBoost等AI技术生成,总数据规模达236亿平方米屋顶面积,验证精度普遍超过80%。该数据集为城市分析和可持续发展目标提供了重要支持。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作