8d synthetic dataset from Clustering: how much bias do we need?.

The Royal Society Figshare2020-10-15 更新2026-04-17 收录

下载链接：

https://rs.figshare.com/articles/dataset/8d_synthetic_dataset_from_Clustering_how_much_bias_do_we_need_/4806568/2

下载链接

链接失效反馈

官方服务：

资源简介：

Scientific investigations in medicine and beyond, increasingly require observations to be described by more features than can be simultaneously visualized. Simply reducing the dimensionality by projections destroys essential relationships in the data. Similarly, traditional clustering algorithms introduce data bias that prevents detection of natural structures expected from generic nonlinear processes. We examine how these problems can best be addressed, where in particular we focus on two recent clustering approaches, Phenograph and Hebbian learning clustering, applied to synthetic and natural data examples. Our results reveal that already for very basic questions, minimizing clustering bias is essential, but that results can benefit further from biased post-processing.

医学及跨学科领域的科学研究，愈发需要以远超可同步可视化的特征维度来表征观测数据。仅通过投影进行简单降维，会破坏数据中的核心关联关系。同理，传统聚类算法会引入数据偏差，使得无法检出通用非线性过程理应呈现的自然结构。本研究探讨了上述问题的最优解决方案，重点聚焦于两种近年提出的聚类方法：Phenograph与赫布学习聚类（Hebbian learning clustering），并将其应用于合成数据集与真实数据示例。研究结果表明，即便针对极为基础的研究问题，最小化聚类偏差均至关重要，且通过偏置后处理步骤可进一步优化实验结果。

创建时间：

2020-10-15