Simplicity of Kmeans versus Deepness of Deep...

Project member? Login to members area.


We study a biodetection application as a case study to demonstrate that K-means-based unsupervised feature learning can be a simple yet effective alternative to deep learning techniques for small data sets with limited intra- as well as inter-class diversity. We investigate the effect of data augmentation as well as feature extraction with multiple patch sizes and at different image scales on the classifier performance. Our data set includes 1833 images from four different classes of bacteria with each bacterial culture captured at three different wavelengths and overall data collected during a three-day period. Limited number and diversity of images present, potential random effects across multiple days, and multi-mode nature of class distributions pose a challenging setting for representation learning. When we use images collected first day for training, second day for validation, and third day for testing K-means-based representation learning achieves 97% classification accuracy on the test data. This compares very favorably to 56% accuracy achieved by deep learning and 74% accuracy achieved by handcrafted features. Our results suggest that data augmentation or dropping connections between units offer little help for deep learning algorithms whereas significant boost can be achieved by K-means-based representation learning by augmenting data and by concatenating features obtained at multiple patch sizes or image scales.

The Purdue University Research Repository (PURR) is a university core research facility provided by the Purdue University Libraries, the Office of the Executive Vice President for Research and Partnerships, and Information Technology at Purdue (ITaP).