2d synthetic dataset labels from Clustering: how much bias do we need?.

Name: 2d synthetic dataset labels from Clustering: how much bias do we need?.
Creator: The Royal Society
Published: 2020-10-15 08:41:50
License: 暂无描述

DataCite Commons2020-10-15 更新2024-07-25 收录

下载链接：

https://rs.figshare.com/articles/dataset/2d_synthetic_dataset_labels_from_Clustering_how_much_bias_do_we_need_/4806574

下载链接

链接失效反馈

官方服务：

资源简介：

Scientific investigations in medicine and beyond, increasingly require observations to be described by more features than can be simultaneously visualized. Simply reducing the dimensionality by projections destroys essential relationships in the data. Similarly, traditional clustering algorithms introduce data bias that prevents detection of natural structures expected from generic nonlinear processes. We examine how these problems can best be addressed, where in particular we focus on two recent clustering approaches, Phenograph and Hebbian learning clustering, applied to synthetic and natural data examples. Our results reveal that already for very basic questions, minimizing clustering bias is essential, but that results can benefit further from biased post-processing.

医学及其他相关领域的科学研究，愈发需要采用数量远超同时可视化承载能力的特征来描述观测数据。仅通过投影进行降维，会破坏数据中至关重要的关联关系。类似地，传统聚类算法会引入数据偏差，阻碍对一般非线性过程理应呈现的自然结构的检测。本研究探讨了如何最优地解决上述问题，重点关注了两种近期提出的聚类方法：Phenograph与赫布学习聚类（Hebbian learning clustering），并将其应用于合成数据与真实数据示例。研究结果表明，即便针对极为基础的问题，最小化聚类偏差也至关重要；而借助带偏差的后处理步骤，还可进一步优化所得结果。

提供机构：

The Royal Society

创建时间：

2017-03-31