Simplicity of K-means versus deepness of Deep Learning. A Case of Unsupervised Feature Learning with Limited Data

Name: Simplicity of K-means versus deepness of Deep Learning. A Case of Unsupervised Feature Learning with Limited Data
Creator: Purdue University Research Repository
Published: 2025-12-18 20:11:21
License: 暂无描述

DataCite Commons2025-12-18 更新2025-04-16 收录

下载链接：

https://purr.purdue.edu/publications/1988/1

下载链接

链接失效反馈

官方服务：

资源简介：

<p style="margin-top: 0px; margin-bottom: 0px;">We study a biodetection application as a case study to demonstrate that K-means-based unsupervised feature learning can be a simple yet effective alternative to deep learning techniques for small data sets with limited intra- as well as inter-class diversity. We investigate the effect of data augmentation as well as feature extraction with multiple patch sizes and at different image scales on the classifier performance. Our data set includes 1833 images from four different classes of bacteria with each bacterial culture captured at three different wavelengths and overall data collected during a three-day period. Limited number and diversity of images present, potential random effects across multiple days, and multi-mode nature of class distributions pose a challenging setting for representation learning. When we use images collected first day for training, second day for validation, and third day for testing K-means-based representation learning achieves 97% classification accuracy on the test data. This compares very favorably to 56% accuracy achieved by deep learning and 74% accuracy achieved by handcrafted features. Our results suggest that data augmentation or dropping connections between units offer little help for deep learning algorithms whereas significant boost can be achieved by K-means-based representation learning by augmenting data and by concatenating features obtained at multiple patch sizes or image scales.</p>

提供机构：

Purdue University Research Repository

创建时间：

2015-09-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集