Abstract
Mid-level semantic attributes have obtained some success in image retrieval and re-ranking. However, due to the semantic gap between the low-level feature and intermediate semantic concept, information loss is considerable in the process of converting the low-level feature to semantic concept. To tackle this problem, we tried to bridge the semantic gap by looking for the complementary of different mid-level features. In this paper, a framework is proposed to improve image re-ranking by fusing multiple mid-level features together. The framework contains three mid-level features (DCNN-ImageNet attributes, Fisher vector, sparse coding spatial pyramid matching) and a semi-supervised multigraph-based model that combines these features together. In addition, our framework can be easily extended to utilize arbitrary number of features for image re-ranking. The experiments are conducted on the a-Pascal dataset, and our approach that fuses different features together is able to boost performance of image re-ranking efficiently.
Similar content being viewed by others
References
Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2559–2566. IEEE, New York (2010)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. European Conference on Computer Vision (ECCV), Vol. 1, pp. 1–2 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1, pp. 886–893. IEEE, New York (2005)
Douze, M., Ramisa, A., Schmid, C.: Combining attributes and fisher vectors for efficient image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 745–752. IEEE, New York (2011)
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE, New York (2009)
Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving” bag-of-keypoints” image categorisation: generative models and pdf-kernels. In: Technical Report, University of Southampton (2005)
Gao, Y., Ji, R., Liu, W., Dai, Q., Hua, G.: Weakly supervised dictionary learning with attributes. In: IEEE Transactions on Image Processing (2014)
Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3-D object retrieval and recognition with hypergraph analysis. IEEE Trans. Image Process. 21(9), 4290–4303 (2012)
Gao, Y., Wang, M., Zha, Z.J., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)
van Gemert, J.C., Geusebroek, J.M., Veenman, C.J., Smeulders, A.W.: Kernel codebooks for scene categorization. In: European Conference on Computer Vision (ECCV), pp. 696–709. Springer, Berlin, Heidelberg (2008)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311. IEEE, New York (2010)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS), Vol. 1, p. 4 (2012)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, pp. 2169–2178. IEEE, New York (2006)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE, New York (2007)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision (ECCV), pp. 143–156. Springer, Berlin, Heidelberg (2010)
Philbin, J., Isard, M., Sivic, J., Zisserman, A.: Descriptor learning for efficient retrieval. In: European Conference on Computer Vision, pp. 677–691. Springer, Berlin, Heidelberg (2010)
Scheirer, W.J., Kumar, N., Belhumeur, P.N., Boult, T.E.: Multi-attribute spaces: calibration for attribute fusion and similarity search. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2933–2940. IEEE, New York (2012)
Siddiquie, B., Feris, R.S., Davis, L.S.: Image ranking and retrieval based on multi-attribute queries. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 801–808. IEEE, New York (2011)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE International Conference on Computer Vision, pp. 1470–1477. IEEE, New York (2003)
Vaquero, D.A., Feris, R.S., Tran, D., Brown, L., Hampapur, A., Turk, M.: Attribute-based people search in surveillance environments. In: Workshop on Applications of Computer Vision (WACV), pp. 1–8. IEEE, New York (2009)
Wang, F., Qi, S., Gao, G., Zhao, S., Wang, X.: Logo information recognition in large-scale social media data. In: Multimedia Systems, pp. 1–11 (2014)
Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multigraph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), 733–746 (2009)
Wang, Y., Mori, G.: A discriminative latent model of object classes and attributes. In: European Conference on Computer Vision (ECCV), pp. 155–168. Springer, New York (2010)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1794–1801. IEEE, New York (2009)
Yang, J., Yu, K., Huang, T.: Efficient highly over-complete sparse coding using a mixture model. In: European Conference on Computer Vision (ECCV), pp. 113–126. Springer, Berlin, Heidelberg (2010)
Yu, F.X., Ji, R., Tsai, M.H., Ye, G., Chang, S.F.: Weak attributes for large-scale image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2949–2956. IEEE, New York (2012)
Zhang, C., Wang, S., Liang, C., Liu, J., Huang, Q., Li, H., Tian, Q.: Beyond bag of words: image representation in sub-semantic space. In: Proceedings of the 21st ACM International Conference On Multimedia, pp. 497–500. ACM, New York (2013)
Zhang, H., Zha, Z.J., Yang, Y., Yan, S., Gao, Y., Chua, T.S.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 33–42. ACM, New York (2013)
Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Trans. Cybern. 44(8), 1408–1419 (2013)
Zhang, L., Gao, Y., Ji, R., Dai, Q., Li, X.: Actively learning human gaze shifting paths for photo cropping. IEEE Trans. Image Process. 21(5), 2235–2245 (2014)
Zhang, L., Gao, Y., Lu, K., Shen, J., Ji, R.: Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans. Multimed. 16(2), 470–479 (2014)
Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. In: IEEE Transactions on Industrial Electronics (2014)
Zhang, L., Gao, Y., Xia, Y., Li, X.: Spatial-aware object-level saliency prediction by learning graphlet hierarchies. In: IEEE Transactions on Image Processing (2014)
Zhang, L., Gao, Y., Zimmermann, R., Tian, Q., Li, X.: Fusion of multi-channel local and global structural cues for photo aesthetics evaluation. IEEE Trans. Image Process. 23(3), 1419–1429 (2014)
Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian, Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans. Image Process. 22(12), 5071–5084 (2013)
Zhang, L., Song, M., Liu, X., Sun, L., Chen, C., Bu, J.: Recognizing architecture styles by hierarchical sparse coding of blocklets. Inf. Sci. 254, 141–154 (2014)
Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., Chen, C.: Probabilistic graphlet transfer for photo cropping. IEEE Trans. Image Process. 22(2), 802–815 (2013)
Zhang, L., Yang, Y., Gao, Y., Yu, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly-supervised images. IEEE Trans. Image Process. 23(9), 4150–4159 (2014)
Zhang, S., Yang, M., Wang, X., Lin, Y., Tian, Q.: Semantic-aware co-indexing for image retrieval. In: Proceedings of IEEE International Conference on Computer Vision (2013)
Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: ACM International Conference on Multimedia (2014)
Zhao, S., Yao, H., Yang, Y., Zhang, Y.: Affective image retrieval via multi-graph learning. In: ACM International Conference on Multimedia (2014)
Acknowledgments
This work was partially supported by Shenzhen Applied Technology Engineering Laboratory for Internet Multimedia Application under Grants Shenzhen Development and Reform Commission No. 2012720; Public Service Platform of Mobile Internet Application Security Industry under Grants Shenzhen Development and Reform Commission No. 2012720; Research on Key Technology in Developing Mobile Internet Intelligent Terminal Application Middleware under Grants No. JC201104210032A; Research on Key Technology of Vision Based Intelligent Interaction under Grants No. JC201005260112A National Natural Science Foundation of China No. 61402181; Science and Technology Programme of Guangzhou Municipal Government No. 2014J4100006.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qi, S., Wang, F., Wang, X. et al. Multiple level visual semantic fusion method for image re-ranking. Multimedia Systems 23, 155–167 (2017). https://doi.org/10.1007/s00530-014-0448-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-014-0448-z