×

Transferrable feature and projection learning with class hierarchy for zero-shot learning. (English) Zbl 1483.68341

Summary: Zero-shot learning (ZSL) aims to transfer knowledge from seen classes to unseen ones so that the latter can be recognised without any training samples. This is made possible by learning a projection function between a feature space and a semantic space (e.g. attribute space). Considering the seen and unseen classes as two domains, a big domain gap often exists which challenges ZSL. In this work, we propose a novel inductive ZSL model that leverages superclasses as the bridge between seen and unseen classes to narrow the domain gap. Specifically, we first build a class hierarchy of multiple superclass layers and a single class layer, where the superclasses are automatically generated by data-driven clustering over the semantic representations of all seen and unseen class names. We then exploit the superclasses from the class hierarchy to tackle the domain gap challenge in two aspects: deep feature learning and projection function learning. First, to narrow the domain gap in the feature space, we define a recurrent neural network over superclasses and then plug it into a convolutional neural network for enforcing the superclass hierarchy. Second, to further learn a transferrable projection function for ZSL, a novel projection function learning method is proposed by exploiting the superclasses to align the two domains. Importantly, our transferrable feature and projection learning methods can be easily extended to a closely related task – few-shot learning (FSL). Extensive experiments show that the proposed model outperforms the state-of-the-art alternatives in both ZSL and FSL tasks.

MSC:

68T07 Artificial neural networks and deep learning
68T45 Machine vision and scene understanding

References:

[1] Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In Proceedings of CVPR, pp. 2927-2936.
[2] Akata, Z.; Perronnin, F.; Harchaoui, Z.; Schmid, C., Label-embedding for image classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 7, 1425-1438 (2016) · doi:10.1109/TPAMI.2015.2487986
[3] Bartels, RH; Stewart, GW, Solution of the matrix equation ax + xb = c [f4], Communications of the ACM, 15, 820-826 (1972) · Zbl 1372.65121 · doi:10.1145/361573.361582
[4] Bo, L., Ren, X., & Fox, D. (2011). Hierarchical matching pursuit for image classification: Architecture and fast algorithms. In NIPS, pp. 2115-2123.
[5] Bucher, M., Herbin, S., & Jurie, F. (2017). Generating visual representations for zero-shot classification. In ICCV workshops: Transferring and adapting source knowledge in computer vision, pp. 2666-2673.
[6] Changpinyo, S., Chao, W. L., Gong, B., & Sha, F. (2016). Synthesized classifiers for zero-shot learning. In Proceedings of CVPR, pp. 5327-5336.
[7] Chao, W. L., Changpinyo, S., Gong, B., & Sha, F. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In Proceedings of ECCV, pp. 52-68.
[8] Chen, L., Zhang, H., Xiao, J., Liu, W., & Chang, S. F. (2018). Zero-shot visual recognition using semantics-preserving adversarial embedding network. In Proceedings of CVPR, pp. 1043-1052.
[9] Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., et al. (2014). Large-scale object classification using label relation graphs. In Proceedings of ECCV, pp. 48-64.
[10] Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., et al. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of ICML, pp. 647-655.
[11] Fei-Fei, L.; Fergus, R.; Perona, P., One-shot learning of object categories, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 4, 594-611 (2006) · doi:10.1109/TPAMI.2006.79
[12] Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of ICML, pp. 1126-1135.
[13] Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M. A., et al. (2013). DeViSE: A deep visual-semantic embedding model. In NIPS, pp. 2121-2129.
[14] Fu, Y., & Sigal, L. (2016). Semi-supervised vocabulary-informed learning. In Proceedings of CVPR, pp. 5337-5346.
[15] Fu, Y.; Hospedales, TM; Xiang, T.; Gong, S., Transductive multi-view zero-shot learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 11, 2332-2345 (2015) · doi:10.1109/TPAMI.2015.2408354
[16] Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2015b). Zero-shot object recognition by semantic manifold distance. In Proceedings of CVPR, pp. 2635-2644.
[17] Fu, Z.; Xiang, T.; Kodirov, E.; Gong, S., Zero-shot learning on semantic class prototype graph, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 8, 2009-2022 (2018) · doi:10.1109/TPAMI.2017.2737007
[18] Graves, A., Mohamed, A., & Hinton, G. E. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of ICASSP, pp. 6645-6649.
[19] Guo, Y., Ding, G., Jin, X., & Wang, J. (2016). Transductive zero-shot recognition via shared model space learning. In Proceedings of AAAI, pp. 3494-3500.
[20] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of CVPR, pp. 770-778.
[21] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of CVPR, pp. 2261-2269.
[22] Hwang, S. J., & Sigal, L. (2014). A unified semantic embedding: Relating taxonomies and attributes. In NIPS, pp. 271-279.
[23] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACMM, pp. 675-678.
[24] Kankuekul, P., Kawewong, A., Tangruamsub, S., & Hasegawa, O. (2012). Online incremental attribute-based zero-shot learning. In Proceedings of CVPR, pp. 3657-3664.
[25] Kodirov, E., Xiang, T., Fu, Z., & Gong, S. (2015). Unsupervised domain adaptation for zero-shot learning. In Proceedings of ICCV, pp. 2452-2460.
[26] Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In Proceedings of CVPR, pp. 3174-3183.
[27] Lake, B. M., Salakhutdinov, R. R., & Tenenbaum, J. (2013). One-shot learning by inverting a compositional causal process. In: NIPS, pp. 2526-2534.
[28] Lampert, CH; Nickisch, H.; Harmeling, S., Attribute-based classification for zero-shot visual object categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 3, 453-465 (2014) · doi:10.1109/TPAMI.2013.140
[29] LeCun, Y.; Boser, B.; Denker, JS; Henderson, D.; Howard, RE; Hubbard, W.; Jackel, LD, Backpropagation applied to handwritten zip code recognition, Neural Computation, 1, 4, 541-551 (1989) · doi:10.1162/neco.1989.1.4.541
[30] Lei Ba, J., Swersky, K., Fidler, S., & Salakhutdinov, R. (2015). Predicting deep zero-shot convolutional neural networks using textual descriptions. In Proceedings of ICCV, pp. 4247-4255.
[31] Li, A.; Lu, Z.; Wang, L.; Xiang, T.; Wen, JR, Zero-shot scene classification for high spatial resolution remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, 55, 7, 4157-4167 (2017) · doi:10.1109/TGRS.2017.2689071
[32] Li, X., Guo, Y., & Schuurmans, D. (2015). Semi-supervised zero-shot classification with label representation learning. In Proceedings of ICCV, pp. 4211-4219.
[33] Liu, F., Xiang, T., Hospedales, T. M., Yang, W., & Sun, C. (2017). Semantic regularisation for recurrent image annotation. In Proceedings of CVPR, pp. 4160-4168.
[34] Long, T.; Xu, X.; Shen, F.; Liu, L.; Xie, N.; Yang, Y., Zero-shot learning via discriminative representation extraction, Pattern Recognition Letters, 109, 27-34 (2018) · doi:10.1016/j.patrec.2017.09.030
[35] Lu, Y. (2015). Unsupervised learning on neural network outputs: With application in zero-shot learning. arXiv:1506.00990.
[36] Miller, GA, Wordnet: An online lexical database, Communications of the ACM, 38, 11, 39-44 (1995) · doi:10.1145/219717.219748
[37] Mishra, A., Reddy, M. S. K., Mittal, A., & Murthy, H. A. (2017). A generative model for zero shot learning using conditional variational autoencoders. arXiv:1709.00663.
[38] Mishra, N., Rohaninejad, M., Chen, X., & Abbeel, P. (2016). A simple neural attentive meta-learner. In Proceedings of ICLR.
[39] Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., et al. (2014) Zero-shot learning by convex combination of semantic embeddings. In Proceedings of ICLR.
[40] Patterson, G.; Xu, C.; Su, H.; Hays, J., The sun attribute database: Beyond categories for deeper scene understanding, International Journal of Computer Vision, 108, 1, 59-81 (2014) · doi:10.1007/s11263-013-0695-z
[41] Qiao, S., Liu, C., Shen, W., & Yuille, A. L. (2018). Few-shot image recognition by predicting parameters from activations. In Proceedings of CVPR, pp. 7229-7238.
[42] Radovanović, M.; Nanopoulos, A.; Ivanović, M., Hubs in space: Popular nearest neighbors in high-dimensional data, Journal of Machine Learning Research, 11, 9, 2487-2531 (2010) · Zbl 1242.62056
[43] Ravi, S,. & Larochelle, H. (2016). Optimization as a model for few-shot learning. In Proceedings of ICLR.
[44] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In CVPR, pp. 7263-7271.
[45] Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning deep representations of fine-grained visual descriptions. In Proceedings of CVPR, pp. 49-58.
[46] Rohrbach, M., Ebert, S., & Schiele, B. (2013). Transfer learning in a transductive setting. In NIPS, pp. 46-54.
[47] Romera-Paredes, B., & Torr, P. H. S. (2015). An embarrassingly simple approach to zero-shot learning. In Proceedings of ICML, pp. 2152-2161.
[48] Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S., ImageNet large scale visual recognition challenge, International Journal of Computer Vision, 115, 3, 211-252 (2015) · doi:10.1007/s11263-015-0816-y
[49] Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., & Matsumoto, Y. (2015). Ridge regression, hubness, and zero-shot learning. In Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases, pp. 135-151.
[50] Shojaee, S. M., & Baghshah, M. S. (2016). Semi-supervised zero-shot learning by a clustering-based approach. arXiv:1605.09016.
[51] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In ICLR.
[52] Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In NIPS, pp. 4080-4090.
[53] Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In NIPS, pp. 935-943.
[54] Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018). Learning to compare: Relation network for few-shot learning. In Proceedings of CVPR, pp. 1199-1208.
[55] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of CVPR, pp. 1-9.
[56] Triantafillou, E., Zemel, R., & Urtasun, R. (2017). Few-shot learning through an information retrieval lens. In NIPS, pp. 2255-2265.
[57] Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. In NIPS, pp. 3630-3638.
[58] Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. Technical Report of CNS-TR-2011-001. California Institute of Technology.
[59] Wang, F.; Zhang, C., Label propagation through linear neighborhoods, IEEE Transactions on Knowledge and Data Engineer, 20, 1, 55-67 (2008) · doi:10.1109/TKDE.2007.190672
[60] Wang, Q.; Chen, K., Zero-shot visual recognition via bidirectional latent embedding, International Journal of Computer Vision, 124, 3, 356-383 (2017) · Zbl 1458.68256 · doi:10.1007/s11263-017-1027-5
[61] Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In Proceedings of CVPR, pp. 69-77.
[62] Xian, Y., Schiele, B., & Akata, Z. (2017). Zero-shot learning—The good, the bad and the ugly. In Proceedings of CVPR, pp. 4582-4591.
[63] Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018). Feature generating networks for zero-shot learning. In Proceedings of CVPR, pp. 5542-5551.
[64] Xu, X., Hospedales, T., & Gong, S. (2015). Semantic embedding space for zero-shot action recognition. In Proceedings of IEEE conference on image processing, pp. 63-67.
[65] Xu, X.; Hospedales, T.; Gong, S., Transductive zero-shot action recognition by word-vector embedding, International Journal of Computer Vision, 123, 3, 309-333 (2017) · Zbl 1455.68226 · doi:10.1007/s11263-016-0983-5
[66] Ye, M., & Guo, Y. (2017). Zero-shot classification with discriminative semantic representation learning. In Proceedings of CVPR, pp. 7140-7148.
[67] Yu, Y., Ji, Z., Li, X., Guo, J., Zhang, Z., Ling, H., et al. (2017). Transductive zero-shot learning with a self-training dictionary approach. arXiv:1703.08893
[68] Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In Proceedings of British machine vision conference, pp. 87.1-87.12.
[69] Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In Proceedings of CVPR, pp. 2021-2030.
[70] Zhang, Z., & Saligrama, V. (2015). Zero-shot learning via semantic similarity embedding. In Proceedings of ICCV, pp. 4166-4174.
[71] Zhang, Z., & Saligrama, V. (2016a). Zero-shot learning via joint latent similarity embedding. In Proceedings of CVPR, pp. 6034-6042.
[72] Zhang, Z., & Saligrama, V. (2016b). Zero-shot recognition via structured prediction. In Proceedings of ECCV, pp. 533-548.
[73] Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., et al. (2015). Conditional random fields as recurrent neural networks. Proceedings of CVPR, pp. 1529-1537.
[74] Zhu, Y., Elhoseiny, M., Liu, B., & Elgammal, A. M. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of CVPR, pp. 1004-1013.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.