BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting
收藏DataCite Commons2024-01-12 更新2024-07-13 收录
下载链接:
https://www.osti.gov/servlets/purl/1986147/
下载链接
链接失效反馈官方服务:
资源简介:
The BuildingsBench datasets consist of: - Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock. - 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF. Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB). BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below: 1. ElectricityLoadDiagrams20112014 2. Building Data Genome Project-2 3. Individual household electric power consumption (Sceaux) 4. Borealis 5. SMART 6. IDEAL 7. Low Carbon London A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench.
BuildingsBench 数据集包含以下组成部分:
- Buildings-900K:一款包含90万栋建筑的大规模数据集,用于针对短期负荷预测(short-term load forecasting, STLF)任务预训练模型。Buildings-900K 在统计层面可完整代表全美建筑存量的整体特征。
- 7个真实住宅与商业建筑数据集,用于基准测试两项评估泛化能力的下游任务:零样本(zero-shot)短期负荷预测,以及面向短期负荷预测的迁移学习。
Buildings-900K 可用于预训练面向住宅与商业建筑的日前短期负荷预测模型。其填补的核心学术空白在于:当前缺乏规模足够大、具备充分多样性的时间序列数据集,以供使用可扩展机器学习模型开展预训练与微调相关研究。
Buildings-900K 由合成生成的能耗时间序列构成,其数据源自美国国家可再生能源实验室(National Renewable Energy Laboratory, NREL)的终端用途负荷概况(End-Use Load Profiles, EULP)数据集(详细数据库链接见下文)。不过 EULP 最初并非为短期负荷预测任务开发,其研发初衷为「助力电力公用事业、电网运营商、制造商、政府机构及研究机构,就研发优先级排序、公用事业资源与配电系统规划、州及地方能源规划与监管等事项制定关键决策」。
与 EULP 类似,Buildings-900K 以 Parquet 文件集的形式存储,且其 Parquet 数据集的组织方式与 EULP 几乎完全一致。由于该数据集仅为每栋建筑提供单条能耗时间序列,整体体量仅约110GB,相对更小。
BuildingsBench 还配套提供一套评估基准,包含各类开源的真实住宅与商业建筑能耗数据集。下述与 Buildings-900K 一同提供的评估数据集均为 CSV 文件集,存储有年度能耗数据。所有评估数据集的总体量不足1GB,具体列表如下:
1. ElectricityLoadDiagrams20112014
2. 建筑数据基因组计划-2(Building Data Genome Project-2)
3. 家庭个体电力消费(Sceaux)
4. Borealis
5. SMART
6. IDEAL
7. 低碳伦敦(Low Carbon London)
每个数据湖版本的 BuildingsBench 文件夹中,均包含 README 文件,可用于详细了解数据存储方式与数据集的组织架构。
提供机构:
DOE Open Energy Data Initiative (OEDI); National Renewable Energy Laboratory
创建时间:
2023-06-24
搜集汇总
背景与挑战
背景概述
BuildingsBench是一个包含90万栋建筑的大规模短期负荷预测数据集,包含合成数据(Buildings-900K)和7个真实建筑基准测试数据集,主要用于机器学习的预训练和微调研究。该数据集填补了STLF领域缺乏大规模多样化时间序列数据的空白。
以上内容由遇见数据集搜集并总结生成



