five

A study on interestingness measures in association rule mining

收藏
DataCite Commons2022-09-07 更新2025-04-16 收录
下载链接:
http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/TU.the.2021.532
下载链接
链接失效反馈
官方服务:
资源简介:
Many interestingness measures have been proposed for mining meaningful association rules among two events in the form of A→B, but their characteristics and semantic similarity relations have not been comprehensively investigated. This paper presents a scenario-based approach for characterizing sixty-one commonly used measures and revealing their relationships in three steps. The first step generates a set of 969 three-probability scenarios, S={s|s=(p(A),p(B),p(A,B)) ∧p(A),p(B),p(B) ∈ [0,1] ∧ p(A,B) ≤((p(A),p(B)) }, in consideration of all possible situations in the range of 0.0 to 1.0 with a step of 0.05, excluding infinity and not-a-number cases. In the second step, 937,992 pairs of scenarios are enumerated, and for each scenario pair s_1 and s_2, the values of a measure (M) of s_1 and s_2, i.e., M(s_1) and M(s_2), are compared with the result of greater-than (M(s_1) > M(s_2)), smaller-than (M(s_1 ) < M(s_2)), or equal-to (M(s_1) = M(s_2)), for characterizing the measure. The final step is based on three types of relations: (1) behavior-based, (2) correlation-based, and (3) association-based similarity relations. The behavior of measures is depicted using nine common algebraic/statistical properties and four special condition properties, i.e., zero, min-max, infinity, and not-a-number of the measures. Similarities among the measures can be examined by grouping measures based on their properties. With three correlation functions, i.e., correlation coefficient, joint entropy, and mutual information, a correlation analysis was performed to discover relations among interestingness measures in the form of dendrograms and clusters with thresholding. Finally, the details of the relations among these interestingness measures are explored with association rule mining. Besides support, confidence, and lift, five types of rules, i.e. same-direction rule (S-rule), opposite-direction rule (O-rule), equal-both rule (E-rule), equal-left rule (EL-rule), and equal-right rule (ER-rule), we were proposed for a five-gradient comparison of any two measures to outline their similarities and dissimilarities in five directions.

针对挖掘A→B形式的两事件间有意义关联规则(association rules),已有多种兴趣度度量(interestingness measures)被提出,但其特征及语义相似性关系尚未得到全面探究。本文提出一种基于场景的方法(scenario-based approach),通过三步对61个常用度量进行特征刻画并揭示其关系。第一步生成969个三概率场景(three-probability scenarios)集合S={s|s=(p(A),p(B),p(A,B)) ∧ p(A),p(B) ∈ [0,1] ∧ p(A,B) ≤ min(p(A),p(B))},考虑0.0到1.0范围内步长为0.05的所有可能情况,排除无穷大和非数值(NaN)情况。第二步枚举937,992个场景对,对每一对场景s₁和s₂,比较度量M在s₁和s₂上的取值(即M(s₁)与M(s₂)),结果分为大于(M(s₁) > M(s₂))、小于(M(s₁) < M(s₂))或等于(M(s₁)=M(s₂)),以实现对度量的特征刻画。第三步基于三种关系类型展开:(1)基于行为的相似关系;(2)基于相关性的相似关系;(3)基于关联性的相似关系。度量的行为通过9个常见代数/统计性质(algebraic/statistical properties)及4个特殊条件性质(即度量的零值、极值、无穷大及非数值情况)来刻画。通过基于属性的度量分组,可考察度量间的相似性。利用相关系数、联合熵及互信息三种相关函数(correlation functions)进行相关性分析,以树状图(dendrograms)和阈值聚类(clusters with thresholding)的形式揭示兴趣度度量间的关系。最后,通过关联规则挖掘(association rule mining)探究这些兴趣度度量间关系的细节。除支持度(support)、置信度(confidence)及提升度(lift)外,本文还提出五种规则类型——同向规则(same-direction rule,S-规则)、反向规则(opposite-direction rule,O-规则)、双向相等规则(equal-both rule,E-规则)、左相等规则(equal-left rule,EL-规则)及右相等规则(equal-right rule,ER-规则),用于任意两个度量的五梯度比较,从五个方向勾勒其异同。
提供机构:
Thammasat University
创建时间:
2022-09-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作