×

Neural networks and deep learning. A textbook. (English) Zbl 1402.68001

Cham: Springer (ISBN 978-3-319-94462-3/hbk; 978-3-319-94463-0/ebook). xxiii, 497 p. (2018).
The book presents itself as a series of 10 chapters spanning from a general introduction to neural networks to an overview of advanced research topics. The style of the book is accessible to a wide audience from students to established researches and from various backgrounds. The notions are introduced thoroughly and the structuring into subsections makes it easier to determine the level of detail appropriate for every user. The theoretical aspects are supported by numerous references and exercises that conclude every chapter; the overview of software resources is valuable and represents a reliable starting point for further study.
The first chapter is focused on introducing and describing the concept of neural networks; it commences with an overview of basic architectures, i.e., with a detailed description of the single computational layer, the perceptron; multilayer neural networks are also introduced. The presentation of the training of a network through the back-propagation algorithm is followed by an overview of practical issues such as over-fitting, vanishing and exploding gradient problems and difficulties in convergence. Next, a summary of common architectures, reiterated in later chapters, is included (e.g. radial basis function network, restricted Boltzmann machines, recurrent neural networks and convolutional neural networks). A brief report of advanced topics such as reinforcement learning and generative adversarial networks concludes the chapter. In the second chapter the author discusses the properties of shallow neural networks. Using the binary classification models as a starting point, the description of the perceptron is revisited side-by-side with alternative approaches including the least-squares regression, logistic regression and support vector machines; the multi-class models are also presented. Part of this chapter focuses on the details of auto-encoders. As examples, the word2vec (used for learning word embeddings from text data and either predict target words from given contexts or predict contexts from input words) and a simple architecture for graph embeddings are presented. In the third chapter the training of deep neural networks is presented using the back-propagation algorithm as starting point. The different angles (e.g. dynamic programming implementation), parameters (pre- and post-activation variables) and approaches (e.g. mini-batch stochastic gradient descent) are discussed in detail. Different tactics for the initialization and setup are also included, followed by strategies for gradient descent methods (e.g. momentum-based learning, gradient clipping, Polyak averaging). The chapter concludes with practical tricks for acceleration and compression using GPUs.
In the fourth chapter the author discusses the ability of deep learners to generalize. Different approaches used to increase the accuracy of models such as tuning and evaluation, penalty-based regularization, ensemble methods, early stopping, unsupervised pre-training, curriculum learning and regularization in an unsupervised manner are all presented in detail. The fifth chapter introduces the radial basis function networks. Details on the effects of training of the hidden and the output layer are presented in detail. The orthogonal least-squares algorithm is also discussed. Variations like the classification with perceptron criterion or with hinge loss are included. The chapter concludes with a discussion on the side-by-side comparison with kernel methods, i.e., the kernel regression and kernel SVMs are presented as a special case of the RBF networks.
The sixth chapter focuses on restricted Boltzman machines; these are introduced first from a historical perspective and then through their connection with the Hopfield networks. For the restricted Boltzman machines, the author presents the contrastive divergence algorithm and applications including the dimensionality reduction and the data reconstruction, collaborative filtering and their application on multimodal data. The chapter ends with a discussion on the effects of stacking RBMs in either a supervised or unsupervised setting and introduces the deep Boltzman machines and deep belief networks. In the seventh chapter the author presents recurrent neural networks (RNNs). After an overview of the architecture of these networks, using language modelling as example, the challenges are illustrated using the layer normalization. Preceding a description of applications of these networks, the echo-state networks, the long short-term memory (LSTM) networks and the gated recurrent units (GRUs) are also described.
The eighth chapter presents convolutional neural networks. Following an overview of the structure of these networks, the author focuses on the particularities of the training step; the back-propagation through convolution with or without inverted/transposed filters. Several case studies of frequently used convolutional architectures (e.g. AlexNet, ZFNet and others) conclude the chapter.
The ninth chapter revolves around deep reinforcement learning; it starts with an example based on stateless algorithms – the multi-armed bandits problem. Next, the author presents the basic framework, focusing on the role of deep learning and the straw-man algorithm. In this chapter the use of bootstrapping for value-function learning, policy gradient methods and the Monte Carlo tree search are also presented. The case studies include the discussion of the Alpha Go, the building of conversational systems and the algorithms behind self-driving cars.
The book concludes (in the tenth chapter) with an overview of advanced topics in deep leaning including attention mechanism, neural networks with external memory, generative adversarial networks (GANs) and competitive learning (vector quantization and Kohonen self-organizing maps). A brief description of the limitations of neural networks is also presented.
The book recommends itself as a stepping-stone of the research-intensive area of deep learning and a worthy continuation of the previous textbooks written by the author [Data mining. The textbook. Cham: Springer (2015; Zbl 1311.68001); Outlier analysis. 2nd edition. Cham: Springer (2017; Zbl 1353.68004); Machine learning for text. Cham: Springer (2018; Zbl 1395.68001)]. Thanks to its systematic and thorough approach complemented with the variety of resources (bibliographic and software references, exercises) neatly presented after each chapter, it is suitable for audiences of varied expertise or background.

MSC:

68-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to computer science
68T05 Learning and adaptive systems in artificial intelligence
68T07 Artificial neural networks and deep learning
82C32 Neural nets applied to problems in time-dependent statistical mechanics
92B20 Neural networks for/in biological studies, artificial life and related topics
Full Text: DOI