Fuzzy forms of the rand , adjusted rand and jaccard indices for fuzzy partitions of gene expression and other data
收藏Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/fuzzy-forms-rand-expression-data/1948664
下载链接
链接失效反馈官方服务:
资源简介:
Clustering is one of the most basic processes that are performed in simplifying data and expressing knowledge in a scientific endeavor. Clustering algorithms have been proposed for the analysis of gene expression data with little guidance available to help choose among them however. Since the output of clustering is a partition of the input data, the quality of the partition must be determined. This paper presents fuzzy extensions to some commonly used clustering measures including the rand index (RI), adjusted rand index(ARI) and the jaccard index(JI) that are already defined for crisp clustering. Fuzzy clustering, and therefore fuzzy cluster indices, is beneficial since it provides more realistic cluster memberships for the objects that are clustered rather than 0 or 1 values. If a crisp partition is still desired the fuzzy partition can be turned in to a crisp partition in an obvious manner. The usefulness of the fuzzy clustering in that case is that it processes noise better. These new indices proposed in this paper, called FRI, FARI, and FJI for fuzzy clustering, give the same values as the original indices do in the special case of crisp clustering. Through use in fuzzy clustering of artificial data and real data, including gene expression data, the effectiveness of the indices is demonstrated. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1
Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ;
Chetty, Madhu ;
Ahmad, Shandar ;
Ngom, Alioune ;
Teng, Shyh Wei ;
Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ;
Coverage:
Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
聚类是科学研究中用于数据简化与知识表达的最基础流程之一。目前已有诸多聚类算法被提出用于基因表达数据分析,但鲜有可供参考的选型指南。由于聚类的输出结果为输入数据集的划分结果,因此需要对该划分的质量进行量化评估。
针对已有硬聚类(crisp clustering)场景下定义的三类常用聚类评估指标——兰德指数(rand index, RI)、调整兰德指数(adjusted rand index, ARI)与雅卡尔指数(jaccard index, JI),本文提出了其模糊扩展形式。模糊聚类及其对应的模糊聚类指标之所以具备应用价值,是因为其可为聚类对象赋予更符合实际的隶属度,而非仅采用0或1的二元隶属关系。若仍需获得硬划分结果,可通过直观方式将模糊划分转化为硬划分,此时模糊聚类的优势在于其对噪声具备更强的鲁棒性。
本文提出的新型模糊聚类指标分别为模糊兰德指数(FRI)、模糊调整兰德指数(FARI)与模糊雅卡尔指数(FJI),在硬聚类的特殊场景下,其计算结果与原始指标完全一致。通过在人工数据集与包含基因表达数据在内的真实数据集的模糊聚类任务中开展实验,本文验证了所提指标的有效性。本研究收录于2008年第三届国际模式识别生物信息学会议(PRIB 2008)论文集,原文链接:http://dx.doi.org/10.1007/978-3-540-88436-1
贡献方:莫纳什大学(Monash University)信息技术学院吉普斯兰信息技术学院;切蒂, 马杜(Chetty, Madhu);艾哈迈德, 尚达尔(Ahmad, Shandar);恩戈姆, 阿利奥涅(Ngom, Alioune);滕, 许伟(Teng, Shyh Wei);第三届国际模式识别协会(IAPR)国际生物信息学模式识别会议(PRIB 2008,澳大利亚墨尔本,2008)
使用范围:
版权声明:本内容版权归第三届国际模式识别生物信息学国际会议所有,保留所有权利。
提供机构:
Monash University



