×

Semi-automatic dynamic auxiliary-tag-aided image annotation. (English) Zbl 1187.68518

Summary: Image annotation is the foundation for many real-world applications. In the age of Web 2.0, image search and browsing are largely based on the tags of images. In this paper, we formulate image annotation as a multi-label learning problem, and develop a semi-automatic image annotation system. The presented system chooses proper words from a vocabulary as tags for a given image, and refines the tags with the help of the user’s feedback. The refinement amounts to a novel multi-label learning framework, named Semi-Automatic Dynamic Auxiliary-Tag-Aided (SADATA), in which the classification result for one certain tag (target tag) can be boosted by the classification results of a subset of the other tags (auxiliary tags). The auxiliary tags, which have strong correlations with the target tag, are determined in terms of the normalized mutual information. We only select those tags whose correlations exceed a threshold as the auxiliary tags, so the auxiliary set is sparse. How much an auxiliary tag can contribute is dependent on the image, so we also build a probabilistic model conditioned on the auxiliary tag and the input image to adjust the weight of the auxiliary tag dynamically. For an given image, the user feedback on the tags corrects the outputs of the auxiliary classifiers and SADATA will recommend more proper tags next round. SADATA is evaluated on a large collection of Corel images. The experimental results validate the effectiveness of our dynamic auxiliary-tag-aided method. Furthermore, the performance also benefits from user feedbacks such that the annotation procedure can be significantly speeded up.

MSC:

68T10 Pattern recognition, speech recognition
68U10 Computing methodologies for image processing
Full Text: DOI

References:

[1] Flickner, M.; Sawhney, H.; Niblack, W.; Ashley, J.; Huang, Q.; Dom, B.; Gorkani, M.; Hafner, J.; Lee, D.; Petkovic, D.; Steele, D.; Yanker, P., Query by image and video content: the QBIC system, Computer, 28, 9, 23-32 (1995)
[2] Gevers, T.; Smeulders, A., Pictoseek: combining color and shape invariant features for image retrieval, IEEE Transactions on Image Processing, 9, 1, 102-119 (2000)
[3] Gupta, A.; Jain, R., Visual information retrieval, Communications of the ACM, 40, 5, 70-79 (1997)
[4] W. Ma, B. Manjunath, Netra: a toolbox for navigating large image databases, in: International Conference on Image Processing, vol. 1, 1997, p. 568.; W. Ma, B. Manjunath, Netra: a toolbox for navigating large image databases, in: International Conference on Image Processing, vol. 1, 1997, p. 568.
[5] Smith, J. R.; Chang, S.-F., Visualseek: a fully automated content-based image query system, (MULTIMEDIA ’96: Proceedings of the Fourth ACM International Conference on Multimedia (1996), ACM: ACM New York, USA), 87-98
[6] Smeulders, A. W.; Worring, M.; Santini, S.; Gupta, A.; Jain, R., Content-based image retrieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 12, 1349-1380 (2000)
[7] Li, J.; Wang, J. Z., Real-time computerized annotation of pictures, IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 6, 985-1002 (2008)
[8] Barnard, K.; Duygulu, P.; Forsyth, D.; de Freitas, N.; Blei, D. M.; Jordan, M. I., Matching words and pictures, Journal of Machine Learning Research, 3, 1107-1135 (2003) · Zbl 1061.68174
[9] Tieu, K.; Viola, P., Boosting image retrieval, International Journal of Computer Vision, 56, 1-2, 17-36 (2004)
[10] S.-F. Cheng, W. Chen, H. Sundaram, Semantic visual templates: linking visual features to semantics, in: Proceedings of the International Conference on Image Processing, ICIP 98, vol. 3, 4-7 October 1998, pp. 531-535.; S.-F. Cheng, W. Chen, H. Sundaram, Semantic visual templates: linking visual features to semantics, in: Proceedings of the International Conference on Image Processing, ICIP 98, vol. 3, 4-7 October 1998, pp. 531-535.
[11] Tong, S.; Chang, E., Support vector machine active learning for image retrieval, (MULTIMEDIA ’01: Proceedings of the Ninth ACM International Conference on Multimedia (2001), ACM: ACM New York, USA), 107-118
[12] Zhang, C.; Chen, T., An active learning framework for content-based information retrieval, IEEE Transactions on Multimedia, 4, 2, 260-268 (2002)
[13] Monay, F.; Gatica-Perez, D., On image auto-annotation with latent space models, (MULTIMEDIA ’03: Proceedings of the Eleventh ACM International Conference on Multimedia (2003), ACM: ACM New York, USA), 275-278
[14] A. Singhal, J. Luo, W. Zhu, Probabilistic spatial context models for scene content understanding, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 18-20 June 2003, pp. I-235-I-241.; A. Singhal, J. Luo, W. Zhu, Probabilistic spatial context models for scene content understanding, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 18-20 June 2003, pp. I-235-I-241.
[15] He, X.; Ma, W.-Y.; Zhang, H.-J., Learning an image manifold for retrieval, (MULTIMEDIA ’04: Proceedings of the 12th Annual ACM International Conference on Multimedia (2004), ACM: ACM New York, USA), 17-23
[16] Rui, Y.; Huang, T.; Ortega, M.; Mehrotra, S., Relevance feedback: a power tool for interactive content-based image retrieval, IEEE Transactions on Circuits and Systems for Video Technology, 8, 5, 644-655 (1998)
[17] Y. Rui, T. Huang, Optimizing learning in image retrieval, in: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 2000, pp. 236-243.; Y. Rui, T. Huang, Optimizing learning in image retrieval, in: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 2000, pp. 236-243.
[18] Kushki, A.; Androutsos, P.; Plataniotis, K.; Venetsanopoulos, A., Query feedback for interactive image retrieval, IEEE Transactions on Circuits and Systems for Video Technology, 14, 5, 644-655 (2004)
[19] Guan, J.; Qiu, G., Learning user intention in relevance feedback using optimization, (MIR ’07: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval (2007), ACM: ACM New York, USA), 41-50
[20] Liu, J.; Li, Z.; Li, M.; Lu, H.; Ma, S., Human behaviour consistent relevance feedback model for image retrieval, (MULTIMEDIA ’07: Proceedings of the 15th International Conference on Multimedia (2007), ACM: ACM New York, USA), 269-272
[21] Wenyin, L.; Dumais, S.; Sun, Y.; Zhang, H.; Czerwinski, M.; Field, B., Semi-automatic image annotation, (Proceedings of Conference on HCI (INTERACT) (2001), IOS Press), 326-333
[22] A. Dorado, E. Izquierdo, Semi-automatic image annotation using frequent keyword mining, in: Proceedings of the Seventh International Conference on Information Visualization, vol. IV, 16-18 July 2003, pp. 532-535.; A. Dorado, E. Izquierdo, Semi-automatic image annotation using frequent keyword mining, in: Proceedings of the Seventh International Conference on Information Visualization, vol. IV, 16-18 July 2003, pp. 532-535.
[23] C. Yang, M. Dong, F. Fotouhi, \( \operatorname{I}^2 \operatorname{A} \); C. Yang, M. Dong, F. Fotouhi, \( \operatorname{I}^2 \operatorname{A} \)
[24] C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database, MIT Press, Cambridge, MA, 1998.; C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database, MIT Press, Cambridge, MA, 1998. · Zbl 0913.68054
[25] Schapire, R. E.; Singer, Y., BoosTexter: a boosting-based system for text categorization, Machine Learning, 39, 2/3, 135-168 (2000) · Zbl 0951.68561
[26] Rousu, J.; Saunders, C.; Szedmak, S.; Shawe-Taylor, J., Kernel-based learning of hierarchical multilabel classification models, Journal of Machine Learning Research, 7, 1601-1626 (2006) · Zbl 1222.68291
[27] Elisseeff, A.; Weston, J., A kernel method for multi-labelled classification, (Neural Information Processing Systems (2001)), 681-687
[28] F. Kang, R. Jin, R. Sukthankar, Correlated label propagation with application to multi-label learning, in: CVPR ’06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 2006, pp. 1719-1726.; F. Kang, R. Jin, R. Sukthankar, Correlated label propagation with application to multi-label learning, in: CVPR ’06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 2006, pp. 1719-1726.
[29] Zhang, M.-L.; Zhou, Z.-H., Ml-KNN: a lazy learning approach to multi-label learning, Pattern Recognition, 40, 7, 2038-2048 (2007) · Zbl 1111.68629
[30] Zhu, S.; Ji, X.; Xu, W.; Gong, Y., Multi-labelled classification using maximum entropy method, (SIGIR ’05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2005), ACM: ACM New York, USA), 274-281
[31] Qi, G.-J.; Hua, X.-S.; Rui, Y.; Tang, J.; Mei, T.; Zhang, H.-J., Correlative multi-label video annotation, (MULTIMEDIA ’07: Proceedings of the 15th International Conference on Multimedia (2007), ACM: ACM New York, USA), 17-26
[32] Zhang, Z.-H.; Zhou, M.-L., Multi-label learning by instance differentiation, (Proceedings of the 22nd Conference on Artificial Intelligence, Vancouver, Canada (2007)), 669-674
[33] Boutell, M. R.; Luo, X. S.J.; Brown, C. M., Learning multi-label scene classification, Pattern Recognition, 37, 9, 1757-1771 (2004)
[34] Gonçalves, T.; Quaresma, P., A preliminary approach to the multilabel classification problem of Portuguese juridical documents, (EPIA (2003)), 435-444
[35] Li, T.; Ogihara, M., Detecting emotion in music, (ISMIR (2003))
[36] Vapnik, V., Statistical Learning Theory (1998), Wiley: Wiley New York · Zbl 0935.62007
[37] Diplaris, S.; Tsoumakas, G.; Mitkas, P. A.; Vlahavas, I. P., Protein classification with multiple algorithms, (Panhellenic Conference on Informatics (2005)), 448-456
[38] Kozintsev, T. M.R. N.; Huang, I. V., Factor graph framework for semantic video indexing, IEEE Transactions on Circuits and Systems for Video Technology, 12, 1, 40-52 (2002)
[39] B.S.J. Wu, Y. Tseng, Ontology-based multi-classification learning for video concept detection, in: IEEE International Conference on Multimedia and Expo 2004, vol. 2, 2004, pp. 1003-1006.; B.S.J. Wu, Y. Tseng, Ontology-based multi-classification learning for video concept detection, in: IEEE International Conference on Multimedia and Expo 2004, vol. 2, 2004, pp. 1003-1006.
[40] Jacobs, R. A.; Jordan, M. I.; Nowlan, S. J.; Hinton, G. E., Adaptive mixtures of local experts, Neural Computation, 3, 1, 79-87 (1991)
[41] Bishop, C.; Svensén, M., Bayesian Hierarchical Mixtures of Experts (2003), Morgan Kaufmann: Morgan Kaufmann San Francisco, CA
[42] J. Karmeshu (Ed.), Entropy Measures, Maximum Entropy Principle and Emerging Applications, Springer, Berlin, 2003.; J. Karmeshu (Ed.), Entropy Measures, Maximum Entropy Principle and Emerging Applications, Springer, Berlin, 2003. · Zbl 1083.94500
[43] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 2, 461-464 (1978) · Zbl 0379.62005
[44] Rissanen, J. J., Modeling by shortest data description, Automatica, 14, 465-471 (1978) · Zbl 0418.93079
[45] Gader, P. D.; Mohamed, M. A.; Keller, J. M., Fusion of handwritten word classifiers, Pattern Recognition Letter, 17, 6, 577-584 (1996)
[46] Ho, T. K.; Hull, J.; Srihari, S., Decision combination in multiple classifier systems, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 1, 66-75 (1994)
[47] Qiu, G., Indexing chromatic and achromatic patterns for content-based colour image retrieval, Pattern Recognition, 35, 8, 1675-1686 (2002) · Zbl 1017.68104
[48] De Comité, F.; Gilleron, R.; Tommasi, M., Learning multi-label alternating decision trees from texts and data, Machine Learning and Data Mining in Pattern Recognition, 251-274 (2003) · Zbl 1029.68568
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.