Slava Akhmechet – Linear Algebra for programmers, half 1

A very powerful factor about studying this weblog put up is to not get scared off by the formulation. The put up could appear like all of the crap you usually skim over, so you might be tempted to skim over this one. Don’t! None of that is laborious. Simply learn the put up prime to backside, and I promise you each particular person step and the entire thing put collectively will make sense.
Highschool math
In hight college your math instructor could have began a remedy of linear algebra by making you clear up a system of linear equations, at which level you very sensibly zoned out since you knew you’d go on to program computer systems and by no means have to resolve a system of linear equations once more (don’t fear, I gained’t be speaking a lot about them right here).
[
begin{eqnarray}
0x_1 + 1x_2 = 0
-1x_1 – 0x_2 = 0
end{eqnarray}
tag{1}
]
You additionally could have discovered that for some cause you may take the coefficients and put them in a 2D array like this: (A=start{bmatrix} 0 & 1 -1 & 0 finish{bmatrix}). You’ve now outlined a matrix (A), and you may re-express the system of linear equations above as follows:
[
newcommandqvec[1]{start{bmatrix}#1end{bmatrix}}
Aqvec{x_1x_2}=0
tag{2}
]
For those who’re actually hellbent on cleansing issues up, you may categorical the vector (qvec{x_1, x_2}) as (x=qvec{x_1 x_2}), which now provides you a very clear equation:
[
Ax=0
tag{3}
]
Equations 1 – 3 are simply alternative ways to say the very same factor. In several conditions you might want one notation over one other, however there isn’t any materials distinction between them. They’re all equal.
Matrix-vector multiplication
I’ll discuss what matrix-vector multiplication means in a second. For now let’s have a look at how the operation is outlined. The exact definition of matrix-vector multiplication flows out of the notation above, so that you by no means once more should look it up on wikipedia. If you must multiply a matrix by a vector, say (start{bmatrix} 0 & 1 -1 & 0 finish{bmatrix} qvec{1 2}), simply recall that that is equal to the left aspect of the system of equations above. Earlier than, we took the coefficients within the linear equation system and factored them out right into a 2D array. Now we reverse the process– take our 2D array and issue it again in as coefficients:
[
qvec{
0*1 + 1*2
-1*1 – 0*2
}=qvec{2 -1}
]
For those who overlook how matrix-vector multiplication works, simply keep in mind that its definition flows out of the notation. Convert the matrix-vector multiplication notation again into the linear equation system notation once more, and also you get the matrix-vector multiplication formulation.
Chances are you’ll keep in mind from highschool that there’s an operation outlined on vectors known as the dot product. The dot product is the sum of pairwise multiplication of components of two vectors. E.g. (qvec{0, 1}cdotqvec{1, 2}=0*1 + 1*2=2). The dot product of two vectors is an operation that represents the diploma to which the 2 vectors level in the identical route.
That’s easy sufficient. However right here is one thing curious. One other manner to think about matrix-vector multiplication is by treating every row of a matrix as its personal vector, and computing the dot merchandise of those row vectors with the vector we’re multiplying by (on this case (qvec{1, 2})). How on earth does that work?! What does vector similarity should do with linear equations, or with matrix-vector multiplication? I can not reply this query fairly but. However we’ll finally construct as much as a solution in future posts.
Matrices as features
Now let’s have a look at what matrix-vector multiplication means. This blew my thoughts once I first discovered about it. You may consider a matrix as a operate, and you may consider multiplying a matrix by a vector as making use of that operate to the vector. So whenever you see (Ax), autocomplete it in your head to “calling some operate (A) with argument (x)”.
That is really not so unusual– you may consider many buildings as features. For instance, you may consider a quantity (3) as a operate. If you multiply it by issues, it makes them thrice larger. Occupied with matrices this fashion occurs to be very handy.
The truth that (Ax=0) denotes each the linear system in equation 1, and a name to a operate (A) with argument (x) (getting the zero vector in return) results in a curious perception in regards to the relationship between highschool math and programming.
In highschool you’re given equations and requested to seek out their roots. We already established {that a} system of equations is equal to matrix-vector multiplication, which might be regarded as operate software. And so, in highschool you’re given a operate (A) together with its output, and requested to discover the inputs that match that output. Programming is often the precise reverse. In programming what you’re given is the form of inputs and outputs, and your job is to assemble a operate (A) that converts inputs to desired outputs. The pc then executes the features you assemble, usually at scale.
So it seems that fixing programs of linear equations in highschool math courses and writing React code in your day job are two sides of the identical coin! It’s a sophisticated concept, so it takes some time to wrap your head round its implications (and its limitations). Illuminating this relationship in rather more depth is one other factor I hope to perform with these collection of posts.
What does (start{bmatrix}0 & 1 -1 & 0 finish{bmatrix}) do?
Going again to our lowly 2×2 matrix (A=start{bmatrix} 0 & 1 -1 & 0 finish{bmatrix}), let’s think about what this matrix does. Taking an equally lowly vector (x=qvec{0, 1}) and multiplying (A) by (x), we get (Ax=qvec{1, 0}). Let’s do that just a few extra occasions:
[
begin{aligned}
&Aqvec{1 0}=qvec{0, -1}
&Aqvec{0 -1}=qvec{-1, 0}
&Aqvec{-1 0}=qvec{0, 1}
end{aligned}
]
After the fourth operation we’re again the place we began. Plotting this we get:
So the matrix (A) performs a clockwise rotation on its enter vector by 90 levels! How does that work? Suppose we needed to write a standard Python operate to do a clockwise 90 diploma rotation:
def rotate(vec):
x, y = vec
return [y, -x]
[rotate([0, 1]),
rotate([1, 0]),
rotate([0, -1]),
rotate([-1, 0])]
[[1, 0], [0, -1], [-1, 0], [0, 1]]
These are the same results that we got by matrix multiplication. It helps to tinker with matrix-vector multiplication by hand to internalize how it works. Can you write a matrix that rotates its input counterclockwise? How about one that rotates by 45 degrees? A matrix that returns its input as is?
Matrix-matrix multiplication
In highschool your teacher also gave you a bunch of rules for how to multiply matrices by each other, and then, if you’re lucky, maybe explained what matrix-matrix multiplication means. I always thought it’s a silly way of teaching it. Where did the rules come from? Did Moses bring them down from mount Sinai on stone tablets?
Instead, let’s do it the sensible way and put the horse before the cart. Knowing what we know, if we were tasked with defining matrix multiplication, what would we want it to do? What could be a reasonable definition?
We’ve seen in the previous section that (A) rotates its input vector clockwise by 90 degrees. Of course when we apply (A), take the resulting vector, and apply (A) again, that gets rotated by 90 degrees. So applying (A) twice rotates the original input clockwise by 180 degrees. Mathematically, we can express it like this: (A(Ax)). We perform matrix-vector multiplication (Ax) which produces a vector, we take that vector, and perform matrix-vector multiplication on that.
Normally in math we want the multiplication operation to associate. It would be strage if the result of (2*(2*3)) were different from ((2*2)*3). So it’s sensible to apply this principle to matrices and make their multiplication associative. In other words, we want the following equation to be true: [
A(Ax)=(AA)x
]
Since (A(Ax)) rotated (x) clockwise by 180 degrees, ideally (AA) would be a matrix that does exactly the same thing. To put it more generally, if matrix (A) performs some operation, and another matrix (B) performs a different operation, we want the matrix (M=AB) to perform both of those operations one after another. A different way of putting it is that if we think of matrices as functions, we want matrix multiplication to act in the same way as function composition.
This is exactly what matrix multiplication does. The derivation is a little bit messy, so instead of doing it inline I added it an appendix on the finish of this put up. For those who overlook why matrix-matrix multiplication is outlined the best way it’s, you may at all times come again to this rationalization.
Kind programs
Let’s have a look at one other instance of matrix-vector multiplication, this time with a non-square matrix:
[begin{bmatrix}
0 & 1
-1 & 0
0 & 0
end{bmatrix}
qvec{1 2}
=qvec{2 -1 0}]
Let’s name this matrix on the left (M). Within the equation above we get our outcome by performing the dot product of (qvec{1, 2}) with each row of (M). In different phrases, we deal with every row of (M) as a vector, carry out a pairwise multiplication of its components with (qvec{1, 2}), and sum them.
Now right here is one thing fascinating. Since you may’t carry out a dot product of two vectors with completely different dimensions, for matrix-vector multiplication to work the variety of components in (qvec{1, 2}) have to be equal to the variety of columns in (M). Switching to pondering of (M) as a operate, we’ve now discovered one thing about the kind of its enter. (M)’s arguments should be vectors of two dimensions.
The alternative is true for (M)’s rows. As a result of we carry out a dot product of (qvec{1, 2}) with each row of (M) and (M) has three rows, the output vector should essentially have three components. And so, the variety of rows in (M) tells us about the kind of its output.
Right here is a straightforward manner of expressing this in typescript:
// Columns are the input; rows are the output
type mC = [number, number];
type mR = [number, number, number];
type M = (in: mC) -> mR;
Now let’s consider how types for matrix multiplication work. Suppose we have a different matrix (N), and we’d like to multiply (N) by (M). What must the type of the matrix (N) be? Well, if we look at a term (NMx), (M)’s output becomes (N)’s input. Thefore, (N)’s input and (M)’s output must be of the same type. In other words, (N)’s number of columns must be the same as (M)’s number of rows:
When you think about which matrices can and cannot be multiplied by each other, it’s actually quite simple– the output of one matrix (its rows) must have the same dimension as the input of another (its columns). If you get confused about that, perhaps writing it out in typescript notation will help!
What’s next?
This post should give you a basic intuition for what matrices are and how they work. However, we’ve just scrated the surface. Here are some interesting questions for you to consider:
- You saw we can treat rows of a matrix as vectors. What is the intuitive meaning of these row vectors?
- This (correctly) suggests we can also treat matrix columns as vectors. What is the intuitive meaning of column vectors?
- If we think of a matrix (A) as a function, can we look at it and tell which inputs result in the output of zero? In other words, how do we solve (Ax=0)?
- We know matrix multiplication can be thought of as composing two functions into a single function. Can we do the opposite and break up a matrix into multiple smaller functions? How? And what would these smaller functions do?
- What the heck are determinants, eigenvectors, and eigenvalues?
To start answering these questions, in the next part I plan to cover linear combinations, vector spaces, linear maps, and linear independence. Stay tuned!
Appendix: matrix-matrix multiplication derivation
Coming back to matrix-matrix multiplication, we’ve already established that we want ((AB)x=A(Bx)). Let’s look at how to derive the multiplication definition from this equation. Rather than do it generally, I’ll do it for a 2×2 example, but exactly the same principle applies to matrices of other sizes.
Let (A=begin{bmatrix} a_{11} & a_{12} a_{21} & a_{22} end{bmatrix}), let (B=begin{bmatrix} b_{11} & b_{12} b_{21} & b_{22} end{bmatrix}), and let (x=qvec{x_1 x_2}). Now let’s look at (A(Bx)). Doing matrix-vector multiplication (Bx) we get:
[
Bx=qvec{x_1b_{11} + x_2{b12} x_1b_{21}+x_2b_{22}}
]
That’s a vector, and we already know how to multiply a matrix by a vector, so we do it again (and get a slightly messy result):
[
begin{aligned}
A(Bx)
&=qvec{
a_{11}(x_1b_{11} + x_2b_{12}) + a_{12}(x_1b_{21} + x_2b_{22})
a_{21}(x_1b_{11} + x_2b_{12}) + a_{22}(x_1b_{21} + x_2b_{22})}
&=qvec{
x_1a_{11}b_{11} + x_2a_{11}b_{12} + x_1a_{12}b_{21} + x_2a_{12}b_{22}
x_1a_{21}b_{11} + x_2a_{21}b_{12} + x_1a_{22}b_{21} + x_2a_{22}b_{22}}
&=qvec{
x_1(a_{11}b_{11} + a_{12}b_{21}) + x_2(a_{11}b_{12} + a_{12}b_{22})
x_1(a_{21}b_{11} + a_{22}b_{21}) + x_2(a_{21}b_{12} + a_{22}b_{22})}
end{aligned}
]
This is starting to look like the left side of a linear system of equations again, and we know we can rewrite it by factoring out the coefficients into a matrix:
[
A(Bx)=begin{bmatrix}
a_{11}b_{11} + a_{12}b_{21} & a_{11}b_{12} + a_{12}b_{22}
a_{21}b_{11} + a_{22}b_{21} & a_{21}b_{12} + a_{22}b_{22}
end{bmatrix}qvec{x_1 x_2}
]
This is our matrix (AB)! It’s first column is (Aqvec{b_{11} b_{21}}), and its second column is (Aqvec{b_{12} b_{22}}). In other words each column of (AB) is a matrix-vector product of (A) with a corresponding column of (B). Which is precisely the definition of matrix-matrix multiplication.