Fast and Robust Archetypal Analysis For Representation Learning[PDF]

Yuansi Chen, Julien Mairal and Zaid Harchaoui

We revisit a pioneer unsupervised learning technique called archetypal analysis [5], which is related to successful data analysis methods such as sparse coding [18] and non-negative matrix factorization [19]. Since it was proposed, archetypal analysis did not gain a lot of popularity even though it produces more interpretable models than other alternatives. Because no efficient implementation has ever been made publicly available, its application to important scientific problems may have been severely limited.

Our goal is to bring back into favour archetypal analysis. We propose a fast optimization scheme using an active-set strategy, and provide an efficient open-source implementation interfaced with Matlab, R, and Python. Then, we demonstrate the usefulness of archetypal analysis for computer vision tasks, such as codebook learning, signal clas- sification, and large image collection visualization.

Formulation

The robust archetypal analysis can be formulated as follows:
Optimization

Experiments on archetypal visualization

We present here visualization results of different image requests from Flickr using robust archetypal analysis. Three requests with different sizes are used: paris366, london2 and berlin5.

The dataset "paris366" obtained by request "paris", consists of 36600 images uploaded on Flickr during the year 2012, sorted by "relevance" according to Flickr. The dataset "london2" obtained by request "london", consists of 72000 images uploaded on Flickr during the 2 year 2011-2012, sorted by "relevance" according to Flickr. The dataset "berlin5" obtained by request "berlin", consists of 160000 images uploaded on Flickr during the 5 year 2008-2012, sorted by "relevance" according to Flickr.

For each image, we compute dense SIFT descriptors and then represent it by a global feature "Fisher Vector", a 2063 dimension vector. We learn p = 128 or 256 archetypes. In the first part, we show the corresponding archetypes. In the second part, we present several remarquable decompositions of original images by archetypes.

There are four types of remarquable decompositions: images decomposed with 3 major archetypes, images decomposed with 2 major archetypes, images trivially decomposed and images badly decomposed. For each type, we show 50 images ordered by the reconstruction error (ascending in the first three types and descending in the last case.)

The deompositions of london2 and other sizes of archetypes are not presented in this site supplementary material due to disk space limitation.

1. Archetypes

DataSet Paris366 London2 Berlin5
Global Feature Fisher Vector (m=2063) Fisher Vector (m=2063) Fisher Vector (m=2063)
Robust version SIFT SIFT SIFT
Date 01/01/2012-12/31/2012 01/01/2011-12/31/2012 01/01/2008-12/31/2012
Size 36600 72000 160000
Tags Paris London Berlin

2. Archetypal Decomposition

Dataset Paris366 Berlin5
Optimization Robust Robust
Global Feature Fisher Vector (m=2063) Fisher Vector (m=2063)
p 256 256
Three decompositions
(more than 3 archetypes > 24%)
(ordered by lowest reconstruction error)
robust SIFT robust SIFT
Two decompositions
(more than 2 archetypes > 24%)
(ordered by lowest reconstruction error)
robust SIFT robust SIFT
Trivial decomposition
(only one archetypes)
(ordered by lowest reconstruction error)
robust SIFT robust SIFT
Non-sparse decompostion
(more than 5 archetypes)
(ordered by highest reconstruction error)
robust SIFT robust SIFT

Installation

Implementation of archetypal analysis is integrated in the SPAMS toolbox. This toolbox is developped by Julien Mairal (INRIA), with the collaboration of Francis Bach (INRIA), Jean Ponce (Ecole Normale Superieure), Guillermo Sapiro (University of Minnesota), Rodolphe Jenatton (INRIA) and Guillaume Obozinski (INRIA). It is coded in C++ with a Matlab interface. Recently, interfaces for R and Python have been developed by Jean-Paul Chieze (INRIA), and archetypal analysis was written by Yuansi Chen (UC Berkeley) during an internship at INRIA.

To install, download the version 2.5 (or greater) of SPAMS toolbox and follow the instructions on the page and "INSTALL-package" file of the package.

Tutorial

Examples of applying archetypal analysis could be found in the test files of the downloaded package. Here is a short Python example using archetypal analysis to learn a dictionary of archetypal image patches.

To Cite

    @inproceedings{chen:hal-00995911,
        hal_id = {hal-00995911},
        url = {http://hal.inria.fr/hal-00995911},
        title = {{Fast and Robust Archetypal Analysis for Representation Learning}},
        author = {Chen, Yuansi and Mairal, Julien and Harchaoui, Zaid},
        booktitle = {{CVPR 2014 - IEEE Conference on Computer Vision & Pattern Recognition}},
        year = {2014},
        month = May,
        pdf = {http://hal.inria.fr/hal-00995911/PDF/main.pdf},
    }