Model17
收藏DataCite Commons2020-11-06 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/analysis/model17
下载链接
链接失效反馈官方服务:
资源简介:
Data preparation – We started with the aggregation of the energy consumption data at a daily level (originally given at a half-hourly interval). Missing values were observed in consumption and temperature metadata, which we imputed using daily level linear interpolation.Next was feature engineering which was essential to bring out the best accuracy in the model.There are mainly three features we have tried to work with:a) Average weather data, Three lag variable for average weather data (day1, day2, day3 lag)b) Time-Series features created using date1. Month (1,2,3,…..12) and2. Weekday (Mon, Tue,……Sun)3. WeekofYear(1,2,…52)c) Cyclic encoding of periodic features for the month and weekday1. Month gets mapped to Month_x = sin(2*pi*month/12) and Month_y=cos(2*pi*month/12)2. Weekday gets mapped to weekday_x = sin(2*pi*weekday/6) and weekday_y=cos(2*pi*weekday /6).Post multiple iterations, we settled with the XGBoost model since it performed the best among all. We trained the model on individual meter_ID and forecasted energy consumption at a daily level. The outcome was then rolled-up, to a monthly level to get our final submission.Update- we identified those customers where our model was not performing well and used the Moving average to forecast for these customers.
数据预处理——我们首先将原始以半小时为间隔采集的能耗数据聚合为日度级别数据。观测到能耗与温度元数据存在缺失值,我们采用日度线性插值法对其进行补全。
随后是特征工程环节,这对最大化模型预测精度至关重要。本次实验共尝试构建三类核心特征:
a) 平均气象数据:包含平均气象数据的3个滞后变量(滞后1日、滞后2日、滞后3日)
b) 基于日期构建的时序特征:
1. 月份(取值范围为1至12)
2. 工作日(涵盖周一至周日)
3. 一年周次(取值范围为1至52)
c) 月份与工作日周期性特征的循环编码:
1. 对月份进行循环编码,令Month_x = sin(2π*month/12)、Month_y = cos(2π*month/12)
2. 对工作日进行循环编码,令weekday_x = sin(2π*weekday/6)、weekday_y = cos(2π*weekday/6)
经过多轮迭代实验后,我们最终选用XGBoost模型,因其在所有候选模型中表现最优。我们以单个电表ID(meter_ID)为单位训练模型,并对日度能耗进行预测,随后将预测结果聚合至月度级别,得到最终提交成果。
补充更新:我们识别出模型预测效果不佳的客户群体,并针对该类客户改用移动平均法进行能耗预测。
提供机构:
IEEE DataPort
创建时间:
2020-11-06



