Cleaned and validated Irish EPC database
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://data.mendeley.com/datasets/5vnnbf8hd6
下载链接
链接失效反馈官方服务:
资源简介:
The raw EPC database is filtered using ‘Python’ and ‘R’, Scripting helped in efficiently handling a large amount of data. The scripts used are combinations of data manipulation steps, applied in series, with filter, and change data based on specified user criteria. In this case, the entire EPC database accounting for 1 million entries in quarter 1 of 2022 was used as the input. This data flow captures the relative frequency of each filter and the resultant number of EPC entries cleaned from the dataset for being considered erroneous or outliers.
Approximately 30% of EPC entries are flagged and labeled as outliers, this gives an overview of the data quality of the EPC database. As discussed earlier in section 3.7, the issue is not localized in Ireland, but it is widespread in MSs. The most striking finding was that many features which play a pivotal part in the overall performance of a dwelling and could uplift, or downgrade energy rating are plagued with poor data, i.e., Living Area Percentage, assigning the areas within a dwelling to be assumed to be heated at 21° C and 18° C . Getting it right is crucial in estimating the representative theoretical energy performance of dwellings. Also, ceiling height data is poor in quality, making it unrepresented of dwellings’ geometry, which again results in inaccurate overall energy performance.
本研究通过Python与R语言对原始能源性能证书(Energy Performance Certificate)数据库进行筛选,脚本编写工作可高效处理大规模数据。所采用的脚本为一系列数据处理步骤的组合,按序执行并包含筛选环节,可根据用户指定的规则修改数据。本次实验以2022年第一季度包含100万条记录的完整EPC数据库作为输入数据。该数据流可记录每一次筛选的相对频次,以及因被判定为错误数据或异常值而从数据集中清理出的EPC记录总数。
约30%的EPC记录被标记为异常值,这一结果可直观反映EPC数据库的整体数据质量状况。正如第3.7节中先前讨论的内容,此类数据质量问题并非仅局限于爱尔兰,而是在成员国(Member States,简称MSs)中广泛存在。最引人注目的研究发现显示,诸多对住宅整体能效表现至关重要、可提升或降低能源评级的特征项均存在数据质量不佳的问题,例如居住面积占比,以及需假设住宅内区域分别以21℃和18℃进行供暖的相关设定数据。准确获取此类数据对估算住宅的代表性理论能源性能至关重要。此外,天花板高度数据的质量同样欠佳,无法准确反映住宅的几何结构,进而导致整体能源性能估算结果出现偏差。
创建时间:
2023-02-14



