Fisher kernels for image descriptors: a theoretical overview and experimental results. (English) Zbl 1289.62062
Summary: Visual words have recently proved to be a key tool in image classification. Best performing Pascal VOC and ImageCLEF systems use Gaussian mixtures or \(k\)-means clustering to define visual words based on the content-based features of points of interest. In most cases, Gaussian Mixture Modeling (GMM) with a Fisher information based distance over the mixtures yields the most accurate classification results.
In this paper we overview the theoretical foundations of the Fisher kernel method. We indicate that it yields a natural metric over images characterized by low level content descriptors generated from a Gaussian mixture. We justify the theoretical observations by reproducing standard measurements over the Pascal VOC 2007 data. Our accuracy is comparable to the most recent best performing image classification systems.
In this paper we overview the theoretical foundations of the Fisher kernel method. We indicate that it yields a natural metric over images characterized by low level content descriptors generated from a Gaussian mixture. We justify the theoretical observations by reproducing standard measurements over the Pascal VOC 2007 data. Our accuracy is comparable to the most recent best performing image classification systems.
MSC:
62H35 | Image analysis in multivariate analysis |
62H30 | Classification and discrimination; cluster analysis (statistical aspects) |
68T45 | Machine vision and scene understanding |
68U10 | Computing methodologies for image processing |
68P20 | Information storage and retrieval of data |