Multiple level visual semantic fusion method for image re-ranking

454 Accesses
1 Citation
Explore all metrics

Abstract

Mid-level semantic attributes have obtained some success in image retrieval and re-ranking. However, due to the semantic gap between the low-level feature and intermediate semantic concept, information loss is considerable in the process of converting the low-level feature to semantic concept. To tackle this problem, we tried to bridge the semantic gap by looking for the complementary of different mid-level features. In this paper, a framework is proposed to improve image re-ranking by fusing multiple mid-level features together. The framework contains three mid-level features (DCNN-ImageNet attributes, Fisher vector, sparse coding spatial pyramid matching) and a semi-supervised multigraph-based model that combines these features together. In addition, our framework can be easily extended to utilize arbitrary number of features for image re-ranking. The experiments are conducted on the a-Pascal dataset, and our approach that fuses different features together is able to boost performance of image re-ranking efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Re-ranking Through Greedy Selection and Rank Fusion

Binary Multi-view Image Re-ranking

Multimodal-Based Supervised Learning for Image Search Reranking

References

Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2559–2566. IEEE, New York (2010)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. European Conference on Computer Vision (ECCV), Vol. 1, pp. 1–2 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1, pp. 886–893. IEEE, New York (2005)
Douze, M., Ramisa, A., Schmid, C.: Combining attributes and fisher vectors for efficient image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 745–752. IEEE, New York (2011)
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE, New York (2009)
Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving” bag-of-keypoints” image categorisation: generative models and pdf-kernels. In: Technical Report, University of Southampton (2005)
Gao, Y., Ji, R., Liu, W., Dai, Q., Hua, G.: Weakly supervised dictionary learning with attributes. In: IEEE Transactions on Image Processing (2014)
Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3-D object retrieval and recognition with hypergraph analysis. IEEE Trans. Image Process. 21(9), 4290–4303 (2012)
Article MathSciNet Google Scholar
Gao, Y., Wang, M., Zha, Z.J., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)
Article MathSciNet Google Scholar
van Gemert, J.C., Geusebroek, J.M., Veenman, C.J., Smeulders, A.W.: Kernel codebooks for scene categorization. In: European Conference on Computer Vision (ECCV), pp. 696–709. Springer, Berlin, Heidelberg (2008)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311. IEEE, New York (2010)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS), Vol. 1, p. 4 (2012)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, pp. 2169–2178. IEEE, New York (2006)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE, New York (2007)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision (ECCV), pp. 143–156. Springer, Berlin, Heidelberg (2010)
Philbin, J., Isard, M., Sivic, J., Zisserman, A.: Descriptor learning for efficient retrieval. In: European Conference on Computer Vision, pp. 677–691. Springer, Berlin, Heidelberg (2010)
Scheirer, W.J., Kumar, N., Belhumeur, P.N., Boult, T.E.: Multi-attribute spaces: calibration for attribute fusion and similarity search. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2933–2940. IEEE, New York (2012)
Siddiquie, B., Feris, R.S., Davis, L.S.: Image ranking and retrieval based on multi-attribute queries. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 801–808. IEEE, New York (2011)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE International Conference on Computer Vision, pp. 1470–1477. IEEE, New York (2003)
Vaquero, D.A., Feris, R.S., Tran, D., Brown, L., Hampapur, A., Turk, M.: Attribute-based people search in surveillance environments. In: Workshop on Applications of Computer Vision (WACV), pp. 1–8. IEEE, New York (2009)
Wang, F., Qi, S., Gao, G., Zhao, S., Wang, X.: Logo information recognition in large-scale social media data. In: Multimedia Systems, pp. 1–11 (2014)
Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multigraph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), 733–746 (2009)
Article Google Scholar
Wang, Y., Mori, G.: A discriminative latent model of object classes and attributes. In: European Conference on Computer Vision (ECCV), pp. 155–168. Springer, New York (2010)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1794–1801. IEEE, New York (2009)
Yang, J., Yu, K., Huang, T.: Efficient highly over-complete sparse coding using a mixture model. In: European Conference on Computer Vision (ECCV), pp. 113–126. Springer, Berlin, Heidelberg (2010)
Yu, F.X., Ji, R., Tsai, M.H., Ye, G., Chang, S.F.: Weak attributes for large-scale image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2949–2956. IEEE, New York (2012)
Zhang, C., Wang, S., Liang, C., Liu, J., Huang, Q., Li, H., Tian, Q.: Beyond bag of words: image representation in sub-semantic space. In: Proceedings of the 21st ACM International Conference On Multimedia, pp. 497–500. ACM, New York (2013)
Zhang, H., Zha, Z.J., Yang, Y., Yan, S., Gao, Y., Chua, T.S.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 33–42. ACM, New York (2013)
Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Trans. Cybern. 44(8), 1408–1419 (2013)
Article Google Scholar
Zhang, L., Gao, Y., Ji, R., Dai, Q., Li, X.: Actively learning human gaze shifting paths for photo cropping. IEEE Trans. Image Process. 21(5), 2235–2245 (2014)
Article MathSciNet Google Scholar
Zhang, L., Gao, Y., Lu, K., Shen, J., Ji, R.: Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans. Multimed. 16(2), 470–479 (2014)
Article Google Scholar
Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. In: IEEE Transactions on Industrial Electronics (2014)
Zhang, L., Gao, Y., Xia, Y., Li, X.: Spatial-aware object-level saliency prediction by learning graphlet hierarchies. In: IEEE Transactions on Image Processing (2014)
Zhang, L., Gao, Y., Zimmermann, R., Tian, Q., Li, X.: Fusion of multi-channel local and global structural cues for photo aesthetics evaluation. IEEE Trans. Image Process. 23(3), 1419–1429 (2014)
Article MathSciNet Google Scholar
Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian, Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans. Image Process. 22(12), 5071–5084 (2013)
Article MathSciNet Google Scholar
Zhang, L., Song, M., Liu, X., Sun, L., Chen, C., Bu, J.: Recognizing architecture styles by hierarchical sparse coding of blocklets. Inf. Sci. 254, 141–154 (2014)
Article Google Scholar
Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., Chen, C.: Probabilistic graphlet transfer for photo cropping. IEEE Trans. Image Process. 22(2), 802–815 (2013)
Article MathSciNet Google Scholar
Zhang, L., Yang, Y., Gao, Y., Yu, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly-supervised images. IEEE Trans. Image Process. 23(9), 4150–4159 (2014)
Article MathSciNet Google Scholar
Zhang, S., Yang, M., Wang, X., Lin, Y., Tian, Q.: Semantic-aware co-indexing for image retrieval. In: Proceedings of IEEE International Conference on Computer Vision (2013)
Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: ACM International Conference on Multimedia (2014)
Zhao, S., Yao, H., Yang, Y., Zhang, Y.: Affective image retrieval via multi-graph learning. In: ACM International Conference on Multimedia (2014)

Download references

Acknowledgments

This work was partially supported by Shenzhen Applied Technology Engineering Laboratory for Internet Multimedia Application under Grants Shenzhen Development and Reform Commission No. 2012720; Public Service Platform of Mobile Internet Application Security Industry under Grants Shenzhen Development and Reform Commission No. 2012720; Research on Key Technology in Developing Mobile Internet Intelligent Terminal Application Middleware under Grants No. JC201104210032A; Research on Key Technology of Vision Based Intelligent Interaction under Grants No. JC201005260112A National Natural Science Foundation of China No. 61402181; Science and Technology Programme of Guangzhou Municipal Government No. 2014J4100006.

Author information

Authors and Affiliations

Computer Application Research Center, ShenZhen Graduate School, Harbin Institute of Technology, Shenzhen, China
Shuhan Qi, Xuan Wang & Jian Guan
Shenzhen Applied Technology Engineering Laboratory for Internet Multimedia Application, Shenzhen, China
Shuhan Qi, Xuan Wang & Jian Guan
Public Service Platform of Mobile Internet Application Security Industry, Shenzhen, China
Shuhan Qi, Xuan Wang & Jian Guan
School of Computing, National University of Singapore, Singapore, Singapore
Fanglin Wang
Dalian University of Technology, Dalian, China
Yue Guan
School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Jia Wei

Authors

Shuhan Qi
View author publications
You can also search for this author in PubMed Google Scholar
Fanglin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Guan
View author publications
You can also search for this author in PubMed Google Scholar
Jia Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jian Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuan Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qi, S., Wang, F., Wang, X. et al. Multiple level visual semantic fusion method for image re-ranking. Multimedia Systems 23, 155–167 (2017). https://doi.org/10.1007/s00530-014-0448-z

Download citation

Published: 03 January 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s00530-014-0448-z

Multiple level visual semantic fusion method for image re-ranking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Visual Re-ranking Through Greedy Selection and Rank Fusion

Binary Multi-view Image Re-ranking

Multimodal-Based Supervised Learning for Image Search Reranking

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multiple level visual semantic fusion method for image re-ranking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Visual Re-ranking Through Greedy Selection and Rank Fusion

Binary Multi-view Image Re-ranking

Multimodal-Based Supervised Learning for Image Search Reranking

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation