five

RTU-HVAC Real-Time Operating Data from Unit in Field

收藏
Mendeley Data2024-03-27 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/9h6gpbhj5k
下载链接
链接失效反馈
官方服务:
资源简介:
The purpose of collecting, creating, and publishing this dataset is to develop novel machine learning methods that can classify faults for datasets that are (1) collected real-time, (2) that are not labeled, and (3) that may be imbalanced. The hypothesis was that we would be able to classify HVAC faults with these conditions with an accuracy > 90%. All the developed methods were able to classify all seven considered faults categories (UCC1, OCC1, UCC2, OCC2, CA, EA, NF); however, only five classes are identified and analyzed because there was no instance in the datasets for the UCC1 and OCC2 faults. The average accuracy of the supervised ML method for the baseline method was high (93.5%); however, the minority class (NF) classification accuracy was low (80.6%) because of the data imbalance. A combination of SVM and a novel unsupervised ML technique that utilizes k-NN labeling (Method 2) was developed. This method is very promising, as it shows a high average accuracy (94.9%) even with a few labeled data points and it can predict multiple faults in the same data point. This method also shows encouraging results for dealing with imbalanced datasets without the need for additional techniques to generate new data points to balance all classes. A combination of SVM, clustering, and unsupervised learning of k-NN labeling (Method 3) was developed. This method is limited to a scenario where only one fault at a time is present in the dataset; however, it is a powerful approach to deal with limited labeled data points. The highest average accuracy was achieved using 50 k-NN. Interestingly, all OCC1 and UCC2 testing data points are correctly predicted, while there were a few data points that were misclassified for CA, EA, and NF. Even though the imbalanced dataset challenge can be handled by using different techniques, the main drawback of this method is the presence of multiple faults in the same data point. Finally, an ensemble method was developed to select between Methods 1 and 2 for each fault type. Rather than looking at the overall accuracy of each method, this method looks at the accuracy of each individual classifier (one classifier for each fault or class). This is useful when it is necessary to select between different methods (SVM or a combination of SVM and unsupervised ML of k-NN labeling) for each classifier, to achieve better predictions, and an overall higher average accuracy.
创建时间:
2024-01-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作