×

A theoretical framework for deep transfer learning. (English) Zbl 1380.68333

Summary: We generalize the notion of PAC learning to include transfer learning. In our framework, the linkage between the source and the target tasks is a result of having the sample distribution of all classes drawn from the same distribution of distributions, and by restricting all source and a target concepts to belong to the same hypothesis subclass. We have two models: an adversary model and a randomized model. In the adversary model, we show that for binary classification, conventional PAC-learning is equivalent to the new notion of PAC-transfer and to transfer generalization of the VC-dimension. For regression, we show that PAC-transferability may exist even in the absence of PAC-learning. In both adversary and randomized models, we provide PAC-Bayesian and VC-style generalization bounds to transfer learning. In the randomized model, we provide bounds specifically derived for Deep Learning. A wide discussion on the tradeoffs between the different involved parameters in the bounds is provided. We demonstrate both cases in which transfer does not reduce the sample size (‘trivial transfer’) and cases in which the sample size is reduced (‘non-trivial transfer’).

MSC:

68T05 Learning and adaptive systems in artificial intelligence

References:

[1] Ando, R. K.; Zhang, T.; Bartlett, P., A framework for learning predictive structures from multiple tasks and unlabeled data, J. Mach. Learn. Res., 6, 1853, (2005) · Zbl 1222.68133
[2] Baxter, J., A model of inductive bias learning, J. Artif. Intell. Res. (JAIR), 12, 198, (2000) · Zbl 0940.68106
[3] Ben-David, S.; Blitzer, J.; Crammer, K.; Pereira, F., Analysis of representations for domain adaptation, Adv. Neural Inf. Process. Syst., 19, 137, (2007)
[4] Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Vaughan, J. W., A theory of learning from different domains, Mach. Learn., 79, 175, (2010a) · Zbl 1470.68081 · doi:10.1007/s10994-009-5152-4
[5] Ben-David, S.; Lu, T.; Luu, T.; Pál, D., Impossibility theorems for domain adaptation, 136, (2010b)
[6] Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K., Learnability and the vapnik-chervonenkis dimension, J. ACM, 36, 965, (1989) · Zbl 0697.68079 · doi:10.1145/76359.76371
[7] Bousquet, O.; Elisseeff, A., Stability and generalization, J. Mach. Learn. Res., 2, 526, (2002) · Zbl 1007.68083
[8] Cortes, C.; Mohri, M.; Riley, M.; Rostamizadeh, A., Sample selection bias correction theory, Algorithmic Learning Theory, 53, (2008) · Zbl 1156.68524
[9] Crammer, K.; Kearns, M.; Wortman, J., Learning from multiple sources, J. Mach. Learn. Res., 9, 1774, (2008) · Zbl 1225.68168
[10] Donahue, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; Darrell, T., (2013)
[11] Fei-Fei, L.; Fergus, R.; Perona, P., One-shot learning of object categories, IEEE Trans. Pattern Anal. Mach. Intell., 28, 611, (2006) · doi:10.1109/TPAMI.2006.79
[12] Girshick, R.; Donahue, J.; Darrell, T.; Malik, J., Rich feature hierarchies for accurate object detection and semantic segmentation, 587, (2014)
[13] Hardt, M.; Recht, B.; Singer, Y., (2015)
[14] Kakade, S. M.; Tewari, A., (2008)
[15] Kifer, D.; Ben-David, S.; Gehrke, J., Detecting change in data streams, 191, (2004)
[16] Krizhevsky, A.; Sutskever, I.; Hinton, G. E., Imagenet classification with deep convolutional neural networks, 1105, (2012)
[17] Kuzborskij, I.; Orabona, F., Stability and hypothesis transfer learning, 950, (2013)
[18] Kuzborskij, I.; Orabona, F.; Caputo, B., From \[n\] to \[n+ 1\]: multiclass transfer incremental learning, 3365, (2013)
[19] Li, X.; Bilmes, J., A Bayesian divergence prior for classiffier adaptation, 282, (2007)
[20] Mansour, Y.; Mohri, M.; Rostamizadeh, A., Domain adaptation: learning bounds and algorithms, (2009a)
[21] Mansour, Y.; Mohri, M.; Rostamizadeh, A., Domain adaptation with multiple sources, 1048, (2009b)
[22] McAllester, D. A., Some PAC-Bayesian theorems, 234, (1998)
[23] McAllester, D., (2013)
[24] Mukherjee, S.; Niyogi, P.; Poggio, T.; Rifkin, R., (2002)
[25] Orabona, F.; Castellini, C.; Caputo, B.; Fiorilla, A. E.; Sandini, G., Model adaptation with least-squares SVM for adaptive hand prosthetics, 2903, (2009)
[26] Pentina, A.; Lampert, C. H., A pac-bayesian bound for lifelong learning, 999, (2014)
[27] Poggio, T.; Anselmi, F.; Rosasco, L., On invariance and selectivity in representation learning, Inf. Inference: J. IMA, 11, 2670, (2015)
[28] Razavian, A. S.; Azizpour, H.; Sullivan, J.; Carlsson, S., (2014)
[29] Rumelhart, D. E.; Hinton, G. E.; Williams, R. J., Learning representations by back-propagating errors, Neurocomputing: Foundations of Research, 699, (1988)
[30] Sauer, N., On the density of families of sets, J. Comb. Theory Ser. A, 13, 147, (1972) · Zbl 0248.05005 · doi:10.1016/0097-3165(72)90019-2
[31] Shalev-Shwartz, S.; Ben-David, S., Understanding Machine Learning: From Theory to Algorithms, (2014) · Zbl 1305.68005
[32] Shalev-Shwartz, S.; Shamir, O.; Srebro, N.; Sridharan, K., Learnability, stability and uniform convergence, JMLR, (2010) · Zbl 1242.68247
[33] Sukhbaatar, S.; Fergus, R., (2014)
[34] Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L., Deepface: closing the gap to human-level performance in face verification, (2014)
[35] Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L., Web-scale training for face identification, 2754, (2015)
[36] Tishby, N.; Pereira, F. C. N.; Bialek, W., (2000)
[37] Tishby, N.; Zaslavsky, N., Deep learning and the information bottleneck principle, 5, (2015)
[38] Tommasi, T.; Orabona, F.; Caputo, B., Safety in numbers: learning categories from few examples with multi model knowledge transfer, 3088, (2010)
[39] Tommasi, T.; Orabona, F.; Caputo, B., Learning categories from few examples with multi model knowledge transfer, IEEE Trans. Pattern Anal. Mach. Intell., 36, 941, (2014) · doi:10.1109/TPAMI.2013.197
[40] Valiant, L. G., A theory of the learnable, Commun. ACM, 27, 1142, (1984) · Zbl 0587.68077 · doi:10.1145/1968.1972
[41] Vapnik, V., Statistical Learning Theory, (1998) · Zbl 0935.62007
[42] Vapnik, V. N., The Nature of Statistical Learning Theory, (1995) · Zbl 0833.62008
[43] Vapnik, V. N.; Chervonenkis, A. Ya., On the uniform convergence of relative frequencies of events to their probabilities, Theory Probab. Appl., 16, 280, (1971) · Zbl 0247.60005 · doi:10.1137/1116025
[44] Yang, Y.; Hospedales, T. M., (2014)
[45] Yang, J.; Yan, R.; Hauptmann, A. G., Cross-domain video concept detection using adaptive svms, 197, (2007)
[46] Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H., (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.