# Deep Learning Book Series · 2.2 Multiplying Matrices and Vectors

This content is part of a series following the chapter 2 on linear algebra from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts. You can check the syllabus in the introduction post.

# Introduction

We will see some very important concepts in this chapter. The dot product is used in every equation explaining data science algorithms so it’s worth the effort to understand it. Then we will see some properties of this operation. Finally, we will get some intuition on the link between matrices and systems of linear equations.

# 2.2 Multiplying Matrices and Vectors

The standard way to multiply matrices is not to multiply each element of one with each element of the other (this is the element-wise product) but to calculate the sum of the products between rows and columns. The matrix product, also called **dot product**, is calculated as following:

*The dot product between a matrix and a vector*

The number of columns of the first matrix must be equal to the number of rows of the second matrix. Thus, if the dimensions, or the shape of the first matrix, is ($m \times n$) the second matrix need to be of shape ($n \times x$). The resulting matrix will have the shape ($m \times x$).

### Example 1.

As a starter we will see the multiplication of a matrix and a vector.

with:

and:

We saw that the formula is the following:

So we will have:

It is a good habit to check the dimensions of the matrix to see what is going on. We can see in this example that the shape of $\bs{A}$ is ($3 \times 2$) and the shape of $\bs{b}$ is ($2 \times 1$). So the dimensions of $\bs{C}$ are ($3 \times 1$).

### With Numpy

The Numpy function `dot()`

can be used to compute the matrix product (or dot product). Let’s try to reproduce the last exemple:

```
A = np.array([[1, 2], [3, 4], [5, 6]])
A
```

array([[1, 2], [3, 4], [5, 6]])

```
B = np.array([[2], [4]])
B
```

array([[2], [4]])

```
C = np.dot(A, B)
C
```

array([[10], [22], [34]])

It is equivalent to use the method `dot()`

of Numpy arrays:

```
C = A.dot(B)
C
```

array([[10], [22], [34]])

### Example 2.

Multiplication of two matrices.

with:

and:

So we have:

Let’s check the result with Numpy:

```
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
A
```

array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]])

```
B = np.array([[2, 7], [1, 2], [3, 6]])
B
```

array([[2, 7], [1, 2], [3, 6]])

```
C = A.dot(B)
C
```

array([[ 13, 29], [ 31, 74], [ 49, 119], [ 67, 164]])

It works!

# Formalization of the dot product

You can find more examples about the dot product here.

# Properties of the dot product

We will now see some interesting properties of the matrix multiplication. It will become useful as we move forward in the chapters. Using simple examples for each property will provide a way to check them while we get used to the Numpy functions.

## Matrices mutliplication is distributive

### Example 3.

is equivalent to

```
A = np.array([[2, 3], [1, 4], [7, 6]])
A
```

array([[2, 3], [1, 4], [7, 6]])

```
B = np.array([[5], [2]])
B
```

array([[5], [2]])

```
C = np.array([[4], [3]])
C
```

array([[4], [3]])

$\bs{A}(\bs{B}+\bs{C})$:

```
D = A.dot(B+C)
D
```

array([[33], [29], [93]])

is equivalent to $\bs{AB}+\bs{AC}$:

```
D = A.dot(B) + A.dot(C)
D
```

array([[33], [29], [93]])

## Matrices mutliplication is associative

```
A = np.array([[2, 3], [1, 4], [7, 6]])
A
```

array([[2, 3], [1, 4], [7, 6]])

```
B = np.array([[5, 3], [2, 2]])
B
```

array([[5, 3], [2, 2]])

$\bs{A}(\bs{BC})$:

```
D = A.dot(B.dot(C))
D
```

array([[100], [ 85], [287]])

is equivalent to $(\bs{AB})\bs{C}$:

```
D = (A.dot(B)).dot(C)
D
```

array([[100], [ 85], [287]])

## Matrix multiplication is not commutative

```
A = np.array([[2, 3], [6, 5]])
A
```

array([[2, 3], [6, 5]])

```
B = np.array([[5, 3], [2, 2]])
B
```

array([[5, 3], [2, 2]])

$\bs{AB}$:

```
AB = np.dot(A, B)
AB
```

array([[16, 12], [40, 28]])

is different from $\bs{BA}$:

```
BA = np.dot(B, A)
BA
```

array([[28, 30], [16, 16]])

## However vector multiplication is commutative

```
x = np.array([[2], [6]])
x
```

array([[2], [6]])

```
y = np.array([[5], [2]])
y
```

array([[5], [2]])

$\bs{x^\text{T}y}$:

```
x_ty = x.T.dot(y)
x_ty
```

array([[22]])

is equivalent to $\bs{y^\text{T}x}$:

```
y_tx = y.T.dot(x)
y_tx
```

array([[22]])

## Simplification of the matrix product

```
A = np.array([[2, 3], [1, 4], [7, 6]])
A
```

array([[2, 3], [1, 4], [7, 6]])

```
B = np.array([[5, 3], [2, 2]])
B
```

array([[5, 3], [2, 2]])

$(\bs{AB})^{\text{T}}$:

```
AB_t = A.dot(B).T
AB_t
```

array([[16, 13, 47], [12, 11, 33]])

is equivalent to $\bs{B}^\text{T}\bs{A}^\text{T}$:

```
B_tA = B.T.dot(A.T)
B_tA
```

array([[16, 13, 47], [12, 11, 33]])

# System of linear equations

This is an important part of why linear algebra can be very useful to solve variety of problems. Here we will see that it can be use to represent system of equations.

A system of equations is a set of multiple equations (at least 1). For instance we could have:

A system of equations is defined by its number of equations and its number of unknowns. In our example above, the system has 2 equations and 2 unknowns ($x$ and $y$). In addition we call this a system of **linear** equations because each equations is linear. It is easy to see that in 2 dimensions: we will have one straight line per equation and the dimensions are the unknowns. Here is the plot of the first one:

*Representation of a linear equation*

In our system of equations, the unknowns are the dimensions and the number of equations is the number of lines (in 2D) or $n$-dimensional planes.

## Using matrices to describe the system

Matrices can be used to describe a system of linear equations of the form $\bs{Ax}=\bs{b}$. Here is such a system:

The unknowns (what we want to find to solve the system) are the variables $x_1$ and $x_2$ corresponding to the previous variables $x$ and $y$. It is exactly the same form as with the last example but with all the variables on the same side. $y = 2x + 1$ becomes $-2x + y = 1$ with $x$ corresponding to $x_1$ and $y$ corresponding to $x_2$. We will have $n$ unknowns and $m$ equations.

The variables are named $x_1, x_2, \cdots, x_n$ by convention because we will see that it can be summarised in the vector $\bs{x}$.

### Left hand side

The left hand term can considered as the product of a matrix $\bs{A}$ containing weights for each variable ($n$ columns) and each equation ($m$ rows):

with a vector $\bs{x}$ containing the $n$ unknowns

The dot product of $\bs{A}$ and $\bs{x}$ gives a set of equations. Here is a simple example:

*Matrix form of a system of linear equations*

We have a set of two equations with two unknowns. So the number of rows of $\bs{A}$ gives the number of equations and the number of columns gives the number of unknowns.

### Both sides

The equation system can be wrote like that:

Or simply:

### Example 4.

We will try to convert the common form of a linear equation $y=ax+b$ to the matrix form. If we want to keep the previous notation we will have instead:

Don’t confuse the variable $x_1$ and $x_2$ with the vector $\bs{x}$. This vector contains actually all the variables of our equations. Here we have:

In this example we will use the following equation:

In order to end up with this system when we multiply $\bs{A}$ and $\bs{x}$ we need $\bs{A}$ to be a matrix containing the weights of each variable. The weight of $x_1$ is $2$ and the weights of $x_2$ is $-1$:

So we have

To complete the equation we have

which gives

This system of equations is thus very simple and contains only 1 equation ($\bs{A}$ has 1 row) and 2 variables ($\bs{A}$ has 2 columns).

To summarise, $\bs{A}$ will be the matrix of dimensions $m\times n$ containing scalars multiplying these variables (here $x_1$ is multiplied by 2 and $x_2$ by -1). The vector $\bs{x}$ contains the variables $x_1$ and $x_2$. And the right hand term is the constant $\bs{b}$:

We can write this system

We will see at the end of the the next chapter that this compact way of writing sets of linear equations can be very usefull. It provides actually a way to solve the equations.

# References

Feel free to drop me an email or a comment. The syllabus of this series can be found in the introduction post. All the notebooks can be found on Github.

1. Scalars, Vectors, Matrices and Tensors

2. Multiplying Matrices and Vectors

3. Identity and Inverse Matrices

4. Linear Dependence and Span

5. Norms

6. Special Kinds of Matrices and Vectors

7. Eigendecomposition

8. Singular Value Decomposition

9. The Moore-Penrose Pseudoinverse

10. The Trace Operator

11. The Determinant

12. Principal Components Analysis (PCA)