
Fisher kernels for image descriptors: a theoretical overview and experimental results. (English) Zbl 1289.62062

Summary: Visual words have recently proved to be a key tool in image classification. Best performing Pascal VOC and ImageCLEF systems use Gaussian mixtures or \(k\)-means clustering to define visual words based on the content-based features of points of interest. In most cases, Gaussian Mixture Modeling (GMM) with a Fisher information based distance over the mixtures yields the most accurate classification results.
In this paper we overview the theoretical foundations of the Fisher kernel method. We indicate that it yields a natural metric over images characterized by low level content descriptors generated from a Gaussian mixture. We justify the theoretical observations by reproducing standard measurements over the Pascal VOC 2007 data. Our accuracy is comparable to the most recent best performing image classification systems.


62H35 Image analysis in multivariate analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T45 Machine vision and scene understanding
68U10 Computing methodologies for image processing
68P20 Information storage and retrieval of data