Abstract
Large-scale data management and retrieval in complex domains such as images, videos, or biometrical data remains one of the most important and challenging information processing tasks. Even after two decades of intensive research, many questions still remain to be answered before working tools become available for everyday use. In this work, we focus on the practical applicability of different multi-modal retrieval techniques. Multi-modal searching, which combines several complementary views on complex data objects, follows the human thinking process and represents a very promising retrieval paradigm. However, a rapid development of modality fusion techniques in several diverse directions and a lack of comparisons between individual approaches have resulted in a confusing situation when the applicability of individual solutions is unclear. Aiming at improving the research community’s comprehension of this topic, we analyze and systematically categorize existing multi-modal search techniques, identify their strengths, and describe selected representatives. In the second part of the paper, we focus on the specific problem of large-scale multi-modal image retrieval on the web. We analyze the requirements of such task, implement several applicable fusion methods, and experimentally evaluate their performance in terms of both efficiency and effectiveness. The extensive experiments provide a unique comparison of diverse approaches to modality fusion in equal settings on two large real-world datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abu-Shareha, A.A., Mandava, R., Khan, L., Ramachandram, D.: Multimodal concept fusion using semantic closeness for image concept disambiguation. Multimedia Tools Appl. 61(1), 69–86 (2011). doi:10.1007/s11042-010-0707-8
Ah-Pine, J., Csurka, G., Clinchant, S.: Unsupervised visual and textual information fusion in CBMIR using graph-based methods. ACM Trans. Inform. Syst. 33(2), 9:1–9:31 (2015). doi:10.1145/2699668
Andrade, F.S.P., Almeida, J., Pedrini, H., S.Torres, R.: Fusion of local and global descriptors for content-based image and video retrieval. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol. 7441, pp. 845–853. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33275-3_104
Arampatzis, A., Zagoris, K., Chatzichristofis, S.A.: Dynamic two-stage image retrieval from large multimodal databases. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 326–337. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20161-5_33
Atrey, P.K., Hossain, M.A., El-Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010). doi:10.1007/s00530-010-0182-0
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval - The Concepts and Technology Behind Search, 2nd edn. Pearson Education Ltd., Harlow (2011)
Barrios, J.M., Bustos, B.: Automatic weight selection for multi-metric distances. In: Proceedings of the 4th International Conference on Similarity Search and Applications (SISAP 2011), pp. 61–68 (2011). doi:10.1145/1995412.1995425
Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P.: Building a web-scale image similarity search system. Multimedia Tools Appl. 47(3), 599–629 (2010). doi:10.1007/s11042-009-0339-z
Batko, M., Kohoutkova, P., Zezula, P.: Combining metric features in large collections. In: 24th International Conference on Data Engineering Workshops (ICDE 2008), pp. 370–377 (2008). doi:10.1109/ICDEW.2008.4498347
Batko, M., Novak, D., Zezula, P.: MESSIF: metric similarity search implementation framework. In: Thanos, C., Borri, F., Candela, L. (eds.) DELOS 2007. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77088-6_1
Benavent, X., Garcia-Serrano, A., Granados, R., Benavent, J., de Ves, E.: Multimedia information retrieval based on late semantic fusion approaches: experiments on a wikipedia image collection. IEEE Trans. Multimedia 15(8), 2009–2021 (2013). doi:10.1109/TMM.2013.2267726
Blanken, H., de Vries, A., Blok, H., Feng, L.: Multimedia Retrieval. Data-Centric Systems and Applications. Springer, Secaucus (2007)
Bossé, É., Roy, J., Wark, S.: Concepts, Models, and Tools for Information Fusion. Artech House, Inc., Norwood (2007)
Bozzon, A., Fraternali, P.: Chapter 8: multimedia and multimodal information retrieval. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 5950, pp. 135–155. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12310-8_8
Budikova, P., Batko, M., Novak, D., Zezula, P.: Inherent fusion: towards scalable multi-modal similarity search. J. Database Manag. 27(4), 1–23 (2016). doi:10.4018/JDM.2016100101
Budikova, P., Batko, M., Zezula, P.: Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 130–142. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24469-8_15
Budikova, P., Batko, M., Zezula, P.: Similarity query postprocessing by ranking. In: Detyniecki, M., Knees, P., Nürnberger, A., Schedl, M., Stober, S. (eds.) AMR 2010. LNCS, vol. 6817, pp. 159–173. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27169-4_12
Bustos, B., Kreft, S., Skopal, T.: Adapting metric indexes for searching in multi-metric spaces. Multimedia Tools Appl. 58(3), 467–496 (2012). doi:10.1007/s11042-011-0731-3
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012). doi:10.1145/2071389.2071390
Chatzichristofis, S.A., Zagoris, K., Boutalis, Y., Arampatzis, A.: A fuzzy rank-based late fusion method for image retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 463–472. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27355-1_43
Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. In: The Proceedings of the VLDB Endowment (PVLDB), pp. 217–228 (2013). doi:10.14778/2535569.2448955
Chen, Y., Yu, N., Luo, B., wen Chen, X.: iLike: integrating visual and textual features for vertical search. In: 18th International Conference on Multimedia (ACM Multimedia 2010), pp. 221–230 (2010). doi:10.1145/1873951.1873984
Ciaccia, P., Patella, M.: Searching in metric spaces with user-defined and approximate distances. ACM Trans. Database Syst. 27(4), 398–437 (2002). doi:10.1145/582410.582412
Clinchant, S., Ah-Pine, J., Csurka, G.: Semantic combination of textual and visual information in multimedia retrieval. In: Proceedings of the 1st International Conference on Multimedia Retrieval (ICMR 2011), p. 44 (2011). doi:10.1145/1991996.1992040
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. Proc. VLDB Endowment (PVLDB) 2(1), 337–348 (2009). doi:10.14778/1687627.1687666
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 5:1–5:60 (2008). doi:10.1145/1348246.1348248
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 248–255 (2009). doi:10.1109/CVPRW.2009.5206848
Depeursinge, A., Müller, H.: Fusion techniques for combining textual and visual information retrieval. In: ImageCLEF. The Kluwer International Series on Information Retrieval, vol. 32, pp. 95–114. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15181-1_6
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 647–655 (2014). http://jmlr.org/proceedings/papers/v32/donahue14.html
Dong, Y., Gao, S., Tao, K., Liu, J., Wang, H.: Performance evaluation of early and late fusion methods for generic semantics indexing. Pattern Anal. Appl. 17(1), 37–50 (2013). doi:10.1007/s10044-013-0336-8
Eickhoff, C., Li, W., Vries, A.P.: Exploiting user comments for audio-visual content indexing and retrieval. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 38–49. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_4
Escalante, H.J., Montes, M., Sucar, L.E.: Multimodal indexing based on semantic cohesion for image retrieval. Inform. Retrieval 15(1), 1–32 (2012). doi:10.1007/s10791-011-9170-z
Fagin, R.: Combining fuzzy information: an overview. SIGMOD Rec. 31(2), 109–118 (2002). doi:10.1145/565117.565143
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)
Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011). doi:10.1109/TMM.2010.2098858
Ha, H., Yang, Y., Fleites, F., Chen, S.: Correlation-based feature analysis and multi-modality fusion framework for multimedia semantic retrieval. In: Proceedings of the 2013 IEEE International Conference on Multimedia and Expo (ICME 2013), pp. 1–6 (2013). doi:10.1109/ICME.2013.6607639
Hemayati, R., Meng, W., Yu, C.: Semantic-based grouping of search engine results using wordnet. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM -2007. LNCS, vol. 4505, pp. 678–686. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72524-4_70
Hoque, E., Strong, G., Hoeber, O., Gong, M.: Conceptual query expansion and visual search results exploration for web image retrieval. In: 7th Atlantic Web Intelligence Conference (AWIC 2011), pp. 73–82 (2011). doi:10.1007/978-3-642-18029-3_8
Hörster, E., Slaney, M., Ranzato, M., Weinberger, K.: Unsupervised image ranking. In: 1st ACM Workshop on Large-Scale Multimedia Retrieval and Mining (LS-MMRM 2009), pp. 81–88 (2009). doi:10.1145/1631058.1631074
Hsu, W.H., Kennedy, L.S., Chang, S.F.: Reranking methods for visual search. IEEE Multimedia 14(3), 14–22 (2007). doi:10.1109/MMUL.2007.61
Jain, R., Sinha, P.: Content without context is meaningless. In: International Conference on Multimedia (ACM Multimedia 2010), pp. 1259–1268. ACM (2010). doi:10.1145/1873951.1874199
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Syst. 20(4), 422–446 (2002). doi:10.1145/582415.582418
Jegou, H., Schmid, C., Harzallah, H., Verbeek, J.J.: Accurate image search using the contextual dissimilarity measure. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 2–11 (2010). doi:10.1109/TPAMI.2008.285
Jing, Y., Baluja, S.: VisualRank: applying PageRank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1877–1890 (2008). doi:10.1109/TPAMI.2008.121
Khasanova, R., Dong, X., Frossard, P.: Multi-modal image retrieval with random walk on multi-layer graphs. In: IEEE International Symposium on Multimedia (ISM 2016), pp. 1–6 (2016). doi:10.1109/ISM.2016.0011
Kherfi, M.L., Ziou, D., Bernardi, A.: Image retrieval from the World Wide Web: Issues, techniques, and systems. ACM Comput. Surv. 36(1), 35–67 (2004). doi:10.1145/1013208.1013210
Kludas, J., Bruno, E., Marchand-Maillet, S.: Information fusion in multimedia information retrieval. In: Boujemaa, N., Detyniecki, M., Nürnberger, A. (eds.) AMR 2007. LNCS, vol. 4918, pp. 147–159. Springer, Heidelberg (2008). doi:10.1007/978-3-540-79860-6_12
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 26th Annual Conference on Neural Information Processing Systems (NIPS 2012), pp. 1106–1114 (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
Lai, K., Liu, D., Chang, S., Chen, M.: Learning sample specific weights for late fusion. IEEE Trans. Image Process. 24(9), 2772–2783 (2015). doi:10.1109/TIP.2015.2423560
Lan, Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Double fusion for multimedia event detection. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 173–185. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27355-1_18
Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. TOMCCAP 2(1), 1–19 (2006). doi:10.1145/1126004.1126005
Li, J.: Reachability based ranking in interactive image retrieval. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015), pp. 867–870 (2015). doi:10.1145/2766462.2767777
Li, J., Ma, Q., Asano, Y., Yoshikawa, M.: Re-ranking by multi-modal relevance feedback for content-based social image retrieval. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) APWeb 2012. LNCS, vol. 7235, pp. 399–410. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29253-8_34
Liu, Y., Mei, T., Hua, X.S.: CrowdReranking: exploring multiple search engines for visual search reranking. In: 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), pp. 500–507 (2009). doi:10.1145/1571941.1572027
Lokoč, J., Novák, D., Batko, M., Skopal, T.: Visual image search: feature signatures or/and global descriptors. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 177–191. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32153-5_13
Ma, D., Yu, Z.: New video target tracking algorithm based on KNN. J. Multimedia 9(5), 709–714 (2014). doi:10.4304/jmm.9.5.709-714
Magalhães, J., Rüger, S.: An information-theoretic framework for semantic-multimedia retrieval. ACM Trans. Inform. Syst. 28(4), 1–32 (2010). doi:10.1145/1852102.1852105
May, W., Fidler, S., Fazly, A.: Unsupervised disambiguation of image captions. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pp. 85–89, June 2012. http://dl.acm.org/citation.cfm?id=2387636.2387652
McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action: Covers Apache Lucene V. 3. 0. Manning Pubs Co Series, Manning (2010)
Mei, T., Rui, Y., Li, S., Tian, Q.: Multimedia search reranking. ACM Comput. Surv. 46(3), 1–38 (2014). doi:10.1145/2536798
Mironica, I., Ionescu, B., Vertan, C.: Hierarchical clustering relevance feedback for content-based image retrieval. In: 10th International Workshop on Content-Based Multimedia Indexing (CBMI 2012), pp. 1–6 (2012). doi:10.1109/CBMI.2012.6269811
MPEG-7: Multimedia content description interfaces. Part 3: Visual. ISO/IEC 15938–3:2002 (2002)
Müller, H., Clough, P., Deselaers, T., Caputo, B.: ImageCLEF: Experimental Evaluation in Visual Information Retrieval, 1st edn. Springer, Heidelberg (2010)
Nga, D.H., Yanai, K.: VisualTextualRank: an extension of VisualRank to large-scale video shot extraction exploiting tag co-occurrence. IEICE Trans. Inform. Syst. 98-D(1), 166–172 (2015). http://search.ieice.org/bin/summary.php?id=e98-d_1_166
Novák, D.: Multi-modal similarity retrieval with distributed key-value store. Mob. Networks Appl. 20(4), 521–532 (2015). doi:10.1007/s11036-014-0561-4
Novák, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inform. Syst. 36(4), 721–733 (2011). doi:10.1016/j.is.2010.10.002
Oh, S., McCloskey, S., Kim, I., Vahdat, A., Cannons, K.J., Hajimirsadeghi, H., Mori, G., Perera, A.G.A., Pandey, M., Corso, J.J.: Multimedia event detection with multimodal feature fusion and temporal concept localization. Mach. Vis. Appl. 25(1), 49–69 (2013). doi:10.1007/s00138-013-0525-x
Park, G., Baek, Y., Lee, H.K.: Web image retrieval using majority-based ranking approach. Multimedia Tools Appl. 31(2), 195–219 (2006). doi:10.1007/s11042-006-0039-x
Patella, M., Ciaccia, P.: Approximate similarity search: a multi-faceted problem. J. Discrete Algorithms 7(1), 36–48 (2009). doi:10.1016/j.jda.2008.09.014
Pedronette, D.C.G., da Silva Torres, R.: Combining re-ranking and rank aggregation methods for image retrieval. Multimedia Tools Appl. 75(15), 9121–9144 (2016). doi:10.1007/s11042-015-3044-0
Pham, T.T., Maillot, N., Lim, J.H., Chevallet, J.P.: Latent semantic fusion model for image retrieval and annotation. In: Sixteenth ACM Conference on Information and Knowledge Management (CIKM 2007), pp. 439–444 (2007). doi:10.1145/1321440.1321503
Pulla, C., Jawahar, C.V.: Multi modal semantic indexing for image retrieval. In: 9th ACM International Conference on Image and Video Retrieval (CIVR 2010), pp. 342–349 (2010). doi:10.1145/1816041.1816091
Qi, S., Wang, F., Wang, X., Guan, Y., Wei, J., Guan, J.: Multiple level visual semantic fusion method for image re-ranking. Multimedia Syst. 23(1), 155–167 (2017). doi:10.1007/s00530-014-0448-z
Richter, F., Romberg, S., Hörster, E., Lienhart, R.: Multimodal ranking for image search on community databases. In: Proceedings of the International Conference on Multimedia Information Retrieval (MIR 2010), pp. 63–72 (2010). doi:10.1145/1743384.1743402
Rokach, L.: Taxonomy for characterizing ensemble methods in classification tasks: a review and annotated bibliography. Comput. Stat. Data Anal. 53(12), 4046–4072 (2009). doi:10.1016/j.csda.2009.07.017
Ross, A., Jain, A.K.: Multimodal biometrics: an overview. In: 12th European Signal Processing Conference, pp. 1221–1224 (2004). http://ieeexplore.ieee.org/abstract/document/7080214/
Rui, Y., Huang, T., Ortega, M., Mehrotra, S.: Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Technol. 8(5), 644–655 (1998). http://ieeexplore.ieee.org/abstract/document/718510/
Safadi, B., Sahuguet, M., Huet, B.: When textual and visual information join forces for multimedia retrieval. In: International Conference on Multimedia Retrieval (ICMR 2014), p. 265 (2014). doi:10.1145/2578726.2578760
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Computer Graphics and Geometric Modeling. Morgan Kaufmann Publishers Inc. (2005)
Santos, J.M., Cavalcanti, J.M.B., Saraiva, P.C., Moura, E.S.: Multimodal re-ranking of product image search results. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 62–73. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_6
Santos, E., Gu, Q.: Automatic content based image retrieval using semantic analysis. J. Intell. Inform. Syst. 43(2), 247–269 (2014). doi:10.1007/s10844-014-0321-8
Siddiquie, B., White, B., Sharma, A., Davis, L.S.: Multi-modal image retrieval for complex queries using small codes. In: International Conference on Multimedia Retrieval (ICMR 2014), p. 321 (2014). doi:10.1145/2578726.2578767
Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000). doi:10.1109/34.895972
Snoek, C., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: 13th ACM International Conference on Multimedia (ACM Multimedia), pp. 399–402 (2005). doi:10.1145/1101149.1101236
Sugiyama, Y., Kato, M.P., Ohshima, H., Tanaka, K.: Relative relevance feedback in image retrieval. In: International Conference on Multimedia and Expo (ICME 2012), pp. 272–277 (2012). doi:10.1109/ICME.2012.161
Tollari, S., Detyniecki, M., Marsala, C., Fakeri-Tabrizi, A., Amini, M.-R., Gallinari, P.: Exploiting visual concepts to improve text-based image retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 701–705. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00958-7_70
Tran, T., Phung, D., Venkatesh, S.: Learning sparse latent representation and distance metric for image retrieval. In: IEEE International Conference on Multimedia and Expo (ICME 2013), pp. 1–6. IEEE (2013). doi:10.1109/ICME.2013.6607435
Uluwitige, D., Chappell, T., Geva, S., Chandran, V.: Improving retrieval quality using pseudo relevance feedback in content-based image retrieval. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), pp. 873–876 (2016). doi:10.1145/2911451.2914747
Wang, L., Yang, L., Tian, X.: Query aware visual similarity propagation for image search reranking. In: ACM Multimedia 2009, pp. 725–728 (2009). doi:10.1145/1631272.1631398
Wang, W., Yang, X., Ooi, B.C., Zhang, D., Zhuang, Y.: Effective deep learning-based multi-modal retrieval. VLDB J. 25(1), 79–101 (2016). doi:10.1007/s00778-015-0391-4
Wang, X.J., Zhang, L., Ma, W.Y.: Duplicate-search-based image annotation using web-scale data. Proc. IEEE 100(9), 2705–2721 (2012). doi:10.1109/JPROC.2012.2193109
Wei, Y., Song, Y., Zhen, Y., Liu, B., Yang, Q.: Heterogeneous translated hashing: A scalable solution towards multi-modal similarity search. ACM Trans. Knowl. Discov. Data 10(4), 36:1–36:28 (2016). doi:10.1145/2744204
Wilkins, P., Smeaton, A.F., Ferguson, P.: Properties of optimally weighted data fusion in CBMIR. In: 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2010), pp. 643–650 (2010). doi:10.1145/1835449.1835556
Wu, P., Hoi, S.C.H., Zhao, P., Miao, C., Liu, Z.: Online multi-modal distance metric learning with application to image retrieval. IEEE Trans. Knowl. Data Eng. 28(2), 454–467 (2016). doi:10.1109/TKDE.2015.2477296
Xiao, Z., Qi, X.: Complementary relevance feedback-based content-based image retrieval. Multimedia Tools Appl. 73(3), 2157–2177 (2014). doi:10.1007/s11042-013-1693-4
Xu, S., Li, H., Chang, X., Yu, S., Du, X., Li, X., Jiang, L., Mao, Z., Lan, Z., Burger, S., Hauptmann, A.G.: Incremental multimodal query construction for video search. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR 2015), pp. 675–678 (2015). doi:10.1145/2671188.2749413
Yang, X., Zhang, Y., Yao, T., Ngo, C., Mei, T.: Click-boosting multi-modality graph-based reranking for image search. Multimedia Syst. 21(2), 217–227 (2015). doi:10.1007/s00530-014-0379-8
Zezula, P.: Future trends in similarity searching. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 8–24. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32153-5_2
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, Advances in Database Systems, vol. 32. Springer (2006)
Zhang, D., Islam, M.M., Lu, G.: A review on automatic image annotation techniques. Pattern Recogn. 45(1), 346–362 (2012). doi:10.1016/j.patcog.2011.05.013
Zhang, S., Yang, M., Cour, T., Yu, K., Metaxas, D.N.: Query specific fusion for image retrieval. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 660–673. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_47
Zheng, L., Wang, S., Tian, L., He, F., Liu, Z., Tian, Q.: Query-adaptive late fusion for image search and person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 1741–1750 (2015). doi:10.1109/CVPR.2015.7298783
Zitouni, H., Sevil, S.G., Ozkan, D., Duygulu, P.: Re-ranking of web image search results using a graph algorithm. In: 19th International Conference on Pattern Recognition (ICPR 2008), pp. 1–4 (2008). doi:10.1109/ICPR.2008.4761472
Acknowledgments
This work was supported by the Czech national research project GA16-18889S. Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer-Verlag GmbH Germany
About this chapter
Cite this chapter
Budikova, P., Batko, M., Zezula, P. (2017). Fusion Strategies for Large-Scale Multi-modal Image Retrieval. In: Hameurlain, A., Küng, J., Wagner, R., Akbarinia, R., Pacitti, E. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIII. Lecture Notes in Computer Science(), vol 10430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55696-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-55696-2_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-55695-5
Online ISBN: 978-3-662-55696-2
eBook Packages: Computer ScienceComputer Science (R0)