five

Team15_5B_2020_SoCSE_KLETech

收藏
IEEE2020-11-15 更新2026-04-17 收录
下载链接:
https://ieee-dataport.org/analysis/team155b2020socsekletech
下载链接
链接失效反馈
官方服务:
资源简介:
Datasets provided: 1. Consumption.csv:Half hourly consumption in kwh of meters for a year is given.Null values %: 51.6%2. Weather-avg/min/max.csv:Per day average/minimum/maximum temperature of meter for a year is given.3. AddInfo.csv:Some additional information for some households is provided but more than 95% values are null. The attributes dwelling_type and no. of bedrooms holds good amount of data with 24% and 13% null values respectively. Data Preprocessing:Firstly, We assigned a number to each meter id and dropped meter ids from the dataframe so that none of our columns remain of object dtype.Treating null values:We had a huge amount of null values in our dataset. Prediction in the presence of null values is highly inappropriate hence, treating null values is must.On observing the dataset we found out that for huge number of meter ids data from January to September was missing. Predicting values for those month can be a challenge.We dropped columns with more than 95% null values then transposed the dataset (Time frame now appear as row names and meters are columns). Later we converted the time into date-time format and performed data smoothening by treating outliers using interquartile range.Then we computed mean values of consumption by each meter in each month and performed interpolation on weather data to treat null values.Now we have an average value of consumption for each month by each meter but the months with null values for entire month are still not treated well.Therefore, for the meters facing unavailability of data for few months are compared to those holding values for those months. For example, consider meter1 and meter2 such that for meter 1, January to September data is unavailable and for meter 2 data is available for all the months. Now we compared the data for October, November and December of both the meters. If they are somewhat similar then we filled the data of meter 1 using the data of meter2 for January to September since their consuming patterns are similar. This was purely a trial and error. Modelling:Classified train and test data.About 80% of the data is train data and 20% is our test data.Then we verified whether our data is yet stationary or not using Dickey Fuller method. Since our dataset doesn't seem stationary by observing the trends so we used differencing method and stationarized the data.At the end we applied ARIMA on our dataset for predicting monthly consumption of each meter.
创建时间:
2020-11-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作