Authors: Marc Peter Deisenroth, Aldo Faisal and Cheng Soon Ong
Publisher: Cambridge University Press
Audience: Developers interested in machine learning
Reviewer: Mike James
Lots of people need to learn the math behind machine learning, so a book on the subject is a good idea.
Machine learning is all about mathematics. You can manage to intuit your way to some sort of understanding, but you cannot be sure that your intuition holds when you push it beyond the norms without math. The key math that you need to know is linear algebra, calculus and, in particular, vector calculus and probability and statistics. Even if you know quite a lot about each of these subjects, the way that machine learning makes use of them might leave you feeling puzzled – vector calculus in particular.
There is plenty of room for a book that explains the math involved in machine learning, but there are a number of possible ways of doing the job. What most people need is something that explains the math in a way that enhances their intuition and understanding of the application to machine learning. What most book writers want to do is reproduce the standard mathematics with perhaps some comments on applicability and this is what this book does.
It starts off with a 40-page chapter on linear algebra and this is dense mathematics. It covers the basics of linear spaces – basis, rank, mappings and so on. Then it moves on to a 20-page chapter on analytic geometry topics often listed under linear or matrix algebra including norms, inner products, orthonormal basis, rotations etc. This is followed by a 20-page chapter on matrix decompositions – mostly eigenvectors, but including singular value decomposition.
Up to this point you can find most of the subject matter in almost any book on undergraduate linear algebra and it is presented in much the same way. There are some examples that apply the ideas to machine learning, but they are not motivating. You are told about the math in fairly abstract ways and then shown an example of where it proved useful. Sometimes the way things are explained don’t really fit with the most common application in machine learning. For example, in the section on eigen decomposition the intuition is based on what this means for a matrix viewed as a transformation, whereas in statistics and machine learning a better view is the relationship to a quadratic form. This gives you the idea that the eigenvectors are the major axes of an ellipsoid, most often the ellipsoids of constant probability in a multidimensional Gaussian. This is the reason they are important – they are directions of maximal variation in the multidimensional space. This book fails to makes this clear.
Finally we get to the chapter on vector calculus and this is something that you need to understand the backpropagation algorithm in particular. It introduces the idea of the gradient and partial derivatives of vector fields in a fairly standard way and then goes on to explain some of the more unusual forms such as gradients of matrices. If you understand calculus, then in many ways the most useful part of the presentation is the list of useful identities – after all, who works these out from first principles each time? The section on automatic differentiation is also very useful.
From here the book moves on to probability and optimization methods. Again, these are perfectly standard presentations of the mathematics.
Part II of the book applies many of the mathematical ideas introduced in the first part to machine learning. First we look at the general ideas of modelling. This covers parameter estimation and topics such as cross validation. Linear regression is next followed by Principle Components. After this we have density estimation using a Gaussian mixture model and finally support vector machines. There isn’t a chapter on neural networks and all their variations and this is a big omission. It leaves you thinking that the authors have is a strangely truncated view of machine learning. There are so many mathematical ideas in neural networks that really should be included, along with less well-known ideas. For example, it would be good if it mentioned ideas such as Hopfield networks as eigenvalues and Boltzman machines as learning distributions. Reinforcement learning is also noticeable by its absence, along with the Bellman equations, Markov Chains and so on.
This is not a book for the mathematically intimidated. Indeed the book is very intimidating in terms of its layout and long presentations of mathematics without any apologies.You certainly need to be reasonably good at, and not frightened of, math to read it and this begs the question of why you don’t just go and get a good book on linear algebra and another on probability and statistics.
This is a math book that groups together some, but far from all, of the mathematical ideas you will encounter in machine learning. In particular, if you are looking for a book that explains the math you need to understand for neural networks or reinforcement learning, this isn’t it. Having said this, the mathematical content is as good as any book on the same subject and there are lots of asides that explain how the math is used in machine learning, but machine learning isn’t used to motivate the math. Personally I’d prefer to read a book on machine learning, neural networks or reinforcement learning and find books on any math I failed to understand to fill in the gaps.