Team18_5A_2020_SoCSE_KLETech

IEEE2020-11-15 更新2026-04-17 收录

下载链接：

https://ieee-dataport.org/analysis/team185a2020socsekletech-0

下载链接

链接失效反馈

官方服务：

资源简介：

Prediction Of Energy Consumption Using Smart Meter Data.Preprocessing:For Preprocessing the data, out of the five given datasets, four of the datasets namely, Consumption ,Weather_min , Weather_max , Weather_avg are considered. After conducting Exploratory Data Analysis, we observed that there are approximately 2.9 Crore missing values in our primary dataset that is, Consumption dataset that needs to be handled in a unique way. The Weather datasets that was provided that is, Weather_min , Weather_max , Weather_avg also had few missing data in each .We did not consider Additional Info dataset in our analysis, because the data was provided for only 2143 meters out of 3248.In addition to this, among the provided 3 major attributes i.e., Dwelling Type, number of bedrooms and number of occupants, we observed that Semi detached house in Dwelling type, 3 bedrooms in number of bedrooms and 2 occupants in number of occupants are significant and rest of them are insignificant.Missing Value Imputation:Consumption Dataset:Now we filled the consumption dataset using the above new weather dataset. We imputed the missing values by considering the current temperature of the meter and comparing it with the other meters and the meter with similar temperature readings were taken and their respective consumption value was filled. Using this, we could impute maximum missing values leaving behind few. The rest of the missing values were imputed using linear interpolation, rolling mean and forward fill.We then resampled the consumption dataset from half hourly to daily data so that we could proceed for analysis. We then tried to observe the trend and seasonality for few random meters and we observed seasonality present in all of them.Augmented Dickey Fuller Test:We conducted Augmented Dickey Fuller Test to check whether any stationarity is present in the data associated with each of the meter. On conducting ADF test we found that out of 3248 meters ,1980 meters were found to be non-stationary and rest of them were stationary. As we need to fit our data into the model, our data ought to be stationary.So , we decided to smoothen the non-stationary data. To Smoothen the data we use a technique called as holt-winters Simple exponential smoothing. This smoothens the data and returns us the stationary data which is now ready to be fitted into the model.Then again we checked for the stationarity we found that all the data associated with each meter was made stationary . In the process of smoothening the data few of the values went missing ,which were imputed using interpolation.Test-Train Split:We split the consumption data into test and train . The Train consists of data of 3248 meters from January 2017 to November 2017 and the test consists of data of 3248 meters of December 2017.Model Building:Here we noticed that the data related to each meter fits a different model as the pattern of usage of electricity is different for different households. When it comes to evaluating errors and accuracies we cannot estimate the errors and accuracies for the forecasted data as the future is unpredictable. So, we make predictions for the test data using SARIMAX model on the train data. Then we estimate the error for each meters (household) separately.The train data is fed into the Auto-ARIMA model (which considers the seasonality of the data as well) that returns the p, d, q parameters that necessary to fit the data into the model where : p: Trend autoregression order. d: Trend difference order. q: Trend moving average order. The train data is fed into the SARIMAX model with the order and seasonal order parameters. Then the data is fitted into this model. After fitting the model, we then make predictions for the test data. We estimate the error using root mean square error between predictions (forecasted values for December 2017) and the expected data ( test data). Out of all the errors for 3248 meters most of them lie between the range of 1 to 2.5 and few lie above this range.We then used the same model to forecast for the upcoming 365 days. Lastly we down sampled the data from 365 days to 12 months as expected.

创建时间：

2020-11-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集