Matrices III

Eigenvalue Problem

We turn to the “second half” of linear algebra, namely the matrix eigenvalue problem: \boldsymbol{A}\boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i. The main trick we employed in the previous section is no longer applicable: subtracting a multiple of a row from another row (i.e., the elimination procedure) changes the eigenvalues of the matrix, so it’s not an operation we’ll be carrying out in what follows.

We will be selective and study the special case where our n \times n matrix \boldsymbol{A} has n eigenvalues \lambda_i that are all distinct. This simplifies things considerably, since it means that the n eigenvectors \boldsymbol{v}_i are linearly independent. In this case, it is easy to show (as you will discover when you solve the relevant problem) that the following relation holds: \boldsymbol{V}^{-1}\boldsymbol{A}\boldsymbol{V} = \boldsymbol{\Lambda}, where \boldsymbol{\Lambda} is the diagonal “eigenvalue matrix” made up of the eigenvalues \lambda_i: \boldsymbol{\Lambda} = \begin{pmatrix} \lambda_0 & 0 & \dots & 0 \\ 0 & \lambda_1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda_{n-1} \end{pmatrix} \tag{1} and \boldsymbol{V} is the “eigenvector matrix”, whose columns are the right eigenvectors \boldsymbol{v}_i: \boldsymbol{V} = (\boldsymbol{v}_0\quad \boldsymbol{v}_1 \quad \dots \quad \boldsymbol{v}_{n-1}).

Equation 1 shows how we can diagonalize a matrix \boldsymbol{A}. As solving the eigenproblem is often called diagonalizing a matrix.

In the following, we shall not assuming that our matrices are sparse or even symmetric, while many problems in physics lead to symmetric matrices, not all do. On the other hand, the eigenvalue problem for nonsymmetric matrices is messy, since the eigenvalues do not need to be real. In what follows, we will study only nonsymmetric matrices that have real (and distinct) eigenvalues.

You have learned that \det(\boldsymbol{A}−\lambda\boldsymbol{I}) = 0 leads to a characteristic equation (namely a polynomial set to 0). However, finding roots of polynomial is very often an ill-conditioned problem, even when the corresponding eigenvalue problem is perfectly well-conditioned.

Thus, it’s wiser, instead, to transform the matrix into a form where it’s easy to read the eigenvalues off, while ensuring that the eigenvalues of the starting and final matrix are the same.

The methods we do employ to computationally solve the eigenvalue problem are iterative; this is different from the system-solving in the previous section, where some methods were direct and some were iterative.

We shall in the following introduce the state-of-the-art QR method, currently the gold standard for the case where one requires all eigenvalues. Before we get to it, though, we will discuss the power and inverse-power methods: these help pedagogically, but will also turn out to be conceptually similar to the full-blown QR approach.

1 Power Method

As already mentioned, we will start with the simplest possible method, which turns out to be intellectually related to more robust methods. The general problem we are trying to solve is \boldsymbol{A}\boldsymbol{v}_i = \lambda_i\boldsymbol{v}_i: \lambda_i are the true eigenvalues and \boldsymbol{v}_i are the true eigenvectors (all of which are currently unknown). Since we’re making the assumption that all n eigenvalues are distinct, we are free to sort them such that: |\lambda_0| > |\lambda_1| > \cdots > |\lambda_{n-1}|.

The power method (in its simplest form) will give us access to only one eigenvalue and eigenvector pair. Specifically, it will allow us to evaluate the largest eigenvalue \lambda_0 (also known as the dominant eigenvalue) and the corresponding eigenvector \boldsymbol{v}_0.

1.1 Algorithm: First Attempt

We start from an ad hoc guess and then see how we can improve it, as is standard in iterative approaches. The method tells us to start from a vector \boldsymbol{z}^{(0)} and simply multiply it with the matrix A to get the next vector in the sequence: \boldsymbol{z}^{(k)} = \boldsymbol{A} \boldsymbol{z}^{(k-1)}, \quad k = 1,2,\dots.

Obviously, we have \boldsymbol{z}^{(1)} = \boldsymbol{A}\boldsymbol{z}^{(0)}, \boldsymbol{z}^{(2)} = \boldsymbol{A}\boldsymbol{z}^{(1)} = \boldsymbol{A}^2\boldsymbol{z}^{(0)}, and so on. This means \boldsymbol{z}^{(k)} = \boldsymbol{A}^k\boldsymbol{z}^{(0)}.

To see why this has to do with calculating eigenvalues, we express our starting vector \boldsymbol{z}^{(0)} as a linear combination of the (unknown) eigenvectors: \boldsymbol{z}^{(0)} = \sum_{i=0}^{n-1} c_i \boldsymbol{v}_i where the coefficient c_i are also unknown.

Thus, we have at the k-th iteration, \boldsymbol{z}^{(k)} = \boldsymbol{A}^k \boldsymbol{z}^{(0)} = \sum_{i = 0}^{n-1} c_i \boldsymbol{A}^k \boldsymbol{v}_i = \sum_{i = 0}^{n-1}c_i \lambda_i^k \boldsymbol{v}_i = c_0 \lambda_0^k \boldsymbol{v}_0 + \lambda_0^k \sum_{i=1}^{n-1} c_i \left(\frac{\lambda_i}{\lambda_0}\right)^k \boldsymbol{v}_i.

Since \lambda_0 is the largest eigenvalue in magnitude, we have (\lambda_i/\lambda_0)^k \to 0 as k\to \infty for i=1,2,\dots.

Thus, as long as c_0\neq 0, we have \lim_{k\to\infty}\boldsymbol{z}^{(k)} = \lim_{k\to\infty} \lambda_0^k \left(c_0\boldsymbol{v}_0 + \sum_{i=1}^{n-1} c_i \left(\frac{\lambda_i}{\lambda_0}\right)^k \boldsymbol{v}_i \right) \propto \boldsymbol{v}_0. This means as k becomes larger and larger, the vector \boldsymbol{z}^{(k)} will be closer and closer in parallel to \boldsymbol{v}_0, the eigenvector of \boldsymbol{A} corresponding to the eigenvalue with the largest magnitude.

To evaluate the eigenvalue \lambda_0, we introduce the Rayleigh quotient of a vector \boldsymbol{x} as follows: \mu(\boldsymbol{x}) = \frac{\boldsymbol{x}^T \boldsymbol{A} \boldsymbol{x}}{\boldsymbol{x}^T \boldsymbol{x}}. If \boldsymbol{x} is an eigenvector, then \mu(\boldsymbol{x}) obviously gives the eigenvalue. If \boldsymbol{x} is not an eigenvector, then \mu(\boldsymbol{x}) is the nearest substitute to an eigenvalue (in the least-square sense, which you will learn in later lectures).

Now consider \boldsymbol{z}^{(k)}, we have that \lim_{k\to \infty}\mu(\boldsymbol{z}^{(k)}) = \lambda_0. In other words, for finite k, \mu(\boldsymbol{z}^{(k)}) is our best estimate for \lambda_0.

1.2 Algorithm: Normalizing

We could also discuss how to get the dominant eigenvector, \boldsymbol{v}_0, from \boldsymbol{z}^{(k)}. However, we observe that we have ignored a problem in our earlier derivation: in the previous expression for \boldsymbol{z}^{(k)}, the \lambda_0^k will become unbounded (if |\lambda_0|>1) or tend to 0 (if |\lambda_0|<1). In order to remedy this, we decide to scale the sequence \boldsymbol{z}^{(k)} between steps.

The simplest way to accomplish such a scaling is to introduce a new sequence \boldsymbol{q}^{(k)} which has the convenient property that \|\boldsymbol{q}^{(k)} \| = 1. In the following we are employing the Euclidean norm implicitly. To do this, we scale \boldsymbol{z}^{(k)} with its norm, \boldsymbol{q}^{(k)} = \frac{\boldsymbol{z}^{(k)}}{\| \boldsymbol{z}^{(k)} \|}, which gives \|\boldsymbol{q}^{(k)} \| =1. In this case, the Reyleigh quotient is \mu(\boldsymbol{q}^{(k)}) = [\boldsymbol{q}^{(k)}]^T \boldsymbol{A} \boldsymbol{q}^{(k)}.

Thus, our new normalized power-method algorithm can be summarized as the following sequence of steps \begin{gather*} \boldsymbol{z}^{(k)} = \boldsymbol{A}\boldsymbol{q}^{(k-1)} \\ \boldsymbol{q}^{(k)} = \frac{\boldsymbol{z}^{(k)}}{\|\boldsymbol{z}^{(k)}\|} \\ \mu(\boldsymbol{q}^{(k)}) = [\boldsymbol{q}^{(k)}]^T \boldsymbol{A} \boldsymbol{q}^{(k)}. \end{gather*} Thus, \boldsymbol{q}^{(k)} will approximately equal the dominant (normalized) eigenvector \boldsymbol{v}_0.

1.3 Implementation

Note that we write a function mag() to compute the magnitude, or the norm, of a vector. Here we used np.sum(xs*xs). We can equivalently use xs@xs.
The function power() takes in a matrix and an optional parameter of how many times to iterate.
We start out with \boldsymbol{q}^{(0)}, a unit-norm vector.
We then multiply \boldsymbol{A} with the unit-norm vector, to produce a non-unit-norm vector \boldsymbol{z}^{(k)}.
We proceed to normalize \boldsymbol{z}^{(k)} to get \boldsymbol{q}^{(k)}.

import numpy as np

def testcreate(n,val):
    A = np.arange(val,val+n*n).reshape(n,n)
    A = np.sqrt(A)
    bs = (A[0,:])**2.1
    return A, bs

def mag(xs):
    return np.sqrt(np.sum(xs*xs))  

def power(A,kmax=6):
    zs = np.ones(A.shape[0])
    qs = zs/mag(zs)
    for k in range(1,kmax):
        zs = A@qs
        qs = zs/mag(zs)
        print(k,qs)

    lam = qs@A@qs
    return lam, qs

def testeigone(f,A,indx=0):
    eigval, eigvec = f(A)
    print(" "); print(eigval); print(eigvec)
    npeigvals, npeigvecs = np.linalg.eig(A)
    print(" ")
    print(npeigvals[indx]); print(npeigvecs[:,indx])


if __name__ == '__main__':
    A, bs = testcreate(4,21)
    testeigone(power,A)

1 [0.44421209 0.48212489 0.51726163 0.55015599]
2 [0.4443962  0.4821814  0.5172089  0.55000734]
3 [0.44439562 0.48218122 0.51720907 0.55000781]
4 [0.44439562 0.48218122 0.51720906 0.55000781]
5 [0.44439562 0.48218122 0.51720906 0.55000781]
 
21.316662663452007
[0.44439562 0.48218122 0.51720906 0.55000781]
 
21.316662663452043
[0.44439562 0.48218122 0.51720906 0.55000781]

Note that in the above code we stopped the iteration after 6 iterations. We could have also introduced a termination criterion, \sum_{j=0}^{n-1} \left| \frac{q_j^{(k)} - q_j^{(k-1)}}{q_j^{(k)}} \right| \leq \epsilon, \tag{2} which you will implement in your homework.

1.4 Operation Count

For iterative methods, the total operation count depends on the actual number of iterations required, which we generally cannot predict ahead of time.
Any operation count we encounter will also have to be multiplied by m, where ms is the number of actual iterations needed.
The bulk of the work is carried out by \boldsymbol{z}^{(k)} = \boldsymbol{A}\boldsymbol{q}^{(k-1)}, which is a matrix-vector multiplication, which costs \sim 2n^2 operations.
We also have to evaluate the norm of \boldsymbol{z}^{(k)}, which is a vector-vector multiplication and costs \sim 2n.
Finally, we need to calculate the Rayleigh quotient, which consists of a single matrix-vector multiplication \sim 2n^2 and a vector-vector multiplication \sim 2n,
Adding up the highest power, we have \sim 2(m+1)n^2 for the total cost.

2 Inverse-Power Method with Shifting

We will now discuss a variation of the power method, which will also allow us to evaluate one eigenvalue and eigenvector pair. This time, it will be for the eigenvalue with the smallest magnitude (in absolute value).

2.1 Algorithm

We start from definition \boldsymbol{A}\boldsymbol{v}_i=\boldsymbol{\lambda}_i \boldsymbol{v}_i. Multiplying on the left with \boldsymbol{A}^{-1}, we find \boldsymbol{v}_i = \lambda_i\boldsymbol{A}^{-1}\boldsymbol{v}_i. Dividing both sides of this equation with \lambda_i, we get \boldsymbol{A}^{-1}\boldsymbol{v}_i = \lambda_i^{inv}\boldsymbol{v}_i where \lambda_i^{inv} is an eigenvalue of the inverse matrix, with \lambda_i^{inv} = 1/\lambda_i. Now, we can apply the power method \boldsymbol{z}^{(k)} = \boldsymbol{A}^{-1}\boldsymbol{q}^{(k-1)}. Since the power method determines the largest eigenvalue of \boldsymbol{A}^{-1}, our method then allow us to evaluate the smallest eigenvalue \lambda_{n-1} of \boldsymbol{A}. Note that \boldsymbol{q}^{(k)} is the estimate of the eigenvector of \boldsymbol{A}^{-1}, it is an eigenvector of \boldsymbol{A} as well. For the eigenvalue estimate, we can evaluate the Rayleigh quotient.

In practice, one can avoid the costly evaluation of the matrix inverse. Instead, at each iteration, we solve for \boldsymbol{z}^{(k)} \boldsymbol{A}\boldsymbol{z}^{(k)} = \boldsymbol{q}^{(k-1)}.

At every step we solve the same system with fixed \boldsymbol{A}, but different right-hand sides. We can thus perform (just once) an LU decomposition, \boldsymbol{A} = \boldsymbol{L} \boldsymbol{U}. Then in each iteration we simply need to calculate \boldsymbol{z}^{(k)} by forward substitution and then backward substitution.

2.2 Eigenvalue Shifting

Before implement the inverse-power method, we want to further refine it. Let’s start with \boldsymbol{A}\boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i. We can subtract from both sides s\boldsymbol{v}_i, with a scaler s. We then obtain (\boldsymbol{A} - s\boldsymbol{I})\boldsymbol{v}_i = (\lambda_i - s) \boldsymbol{v}_i, which gives a new eigenvalue problem \boldsymbol{A}^{*}\boldsymbol{v}_i = \lambda_i^* \boldsymbol{v}_i with \boldsymbol{A}^{*} = \boldsymbol{A} - s\boldsymbol{I} and \lambda_i^* = \lambda_i - s. We can apply the inverse-power method for \boldsymbol{A}^*, which solves the smallest eigenvalue \lambda_i^*. The latter actually corresponds to the eigenvalue \lambda_i of the original matrix \boldsymbol{A} which is closest to s.

To get the estimate of the eigenvalue \lambda_i of \boldsymbol{A} (of \boldsymbol{A}^* as well), we could directly use \mu(\boldsymbol{q}^{(k)}) = [\boldsymbol{q}^{(k)}]^T\boldsymbol{A}\boldsymbol{q}^{(k)}.

Another application, would be to find out the eigenvector if we already know an estimate the a particular eigenvalue of \boldsymbol{A}. We can simply choose the shift from this estimate, and calculate \boldsymbol{q}^{(k)}, which will converge to the corresponding eigenvector.

One last comment: the convergence of the power method is determined by the ratio |\lambda_1/\lambda_0|. On the other hand, the unshifted inverse-power method’s convergence is converged by |\lambda_{n-1}/\lambda_{n-2}|. If |\lambda_{n-1}| is much smaller than |\lambda_{n-2}|, then the convergence is fast. Thus, we can choose a shift value s very close to \lambda_{n-1}, such that |\lambda_{n-1}^*|\ll |\lambda_{n-2}^*|, in order to accelarate convergence.

2.3 Implementation

We implement the inverse-power method in the following code. Comparing our new function to power(), we notice three main differences.

At the start we shift our original matrix to produce \boldsymbol{A}^∗ = \boldsymbol{A} − s\boldsymbol{I}. At the end of the process, we evaluate the Rayleigh quotient for the original matrix \boldsymbol{A}, which allows us to evaluate the eigenvalue \lambda_i of the original matrix that is closest to the hand-picked shift s.
We implement a self-stopping criterion with the if...break. We have been careful to create a new copy of our vector each time through the loop, via qs = np.copy(qnews). We also check for sign flips and adjust it accordingly, otherwise the iteration will not converge.
power() employ @ in the bulk of the code. Here, we have to solve a linear system of equations: by first LU-decomposing \boldsymbol{A^*} (only once, outside the loop.) Then, inside the loop we only use the forward and backward substitutions, which are considerably less costly.

import numpy as np

def testcreate(n,val):
    A = np.arange(val,val+n*n).reshape(n,n)
    A = np.sqrt(A)
    bs = (A[0,:])**2.1
    return A, bs

def forsub(L,bs):
    n = bs.size
    xs = np.zeros(n)
    for i in range(n):
        xs[i] = (bs[i] - L[i,:i]@xs[:i])/L[i,i]
    return xs

def backsub(U,bs):
    n = bs.size
    xs = np.zeros(n)
    for i in reversed(range(n)):
        xs[i] = (bs[i] - U[i,i+1:]@xs[i+1:])/U[i,i]
    return xs

def ludec(A):
    n = A.shape[0]
    U = np.copy(A)
    L = np.identity(n)

    for j in range(n-1):
        for i in range(j+1,n):
            coeff = U[i,j]/U[j,j]
            U[i,j:] -= coeff*U[j,j:]
            L[i,j] = coeff
    return L, U


def termcrit(xolds,xnews):
    errs = np.abs((xnews - xolds)/xnews)
    return np.sum(errs)

def mag(xs):
    return np.sqrt(np.sum(xs*xs))  

def testeigone(f,A,indx=0):
    eigval, eigvec = f(A)
    print(" "); print(eigval); print(eigvec)
    npeigvals, npeigvecs = np.linalg.eig(A)
    print(" ")
    print(npeigvals[indx]); print(npeigvecs[:,indx])

#def invpowershift(A,shift=20,kmax=200,tol=1.e-2):
def invpowershift(A,shift=20,kmax=200,tol=1.e-8):
    n = A.shape[0]
    znews = np.ones(n)
    qnews = znews/mag(znews)
    Astar = A - np.identity(n)*shift
    L, U = ludec(Astar)

    for k in range(1,kmax):
        qs = np.copy(qnews)
        ys = forsub(L,qs)
        znews = backsub(U,ys)
        qnews = znews/mag(znews)

        if qs@qnews<0:
            qnews = -qnews

        err = termcrit(qs,qnews)
        print(k, qnews, err)

        if err < tol:
            lam = qnews@A@qnews
            break
    else:
        lam = qnews = None

    return lam, qnews

if __name__ == '__main__':
    A, bs = testcreate(4,21)
    testeigone(invpowershift,A)

1 [0.44065813 0.48090955 0.51822387 0.55316403] 0.30563801879330116
2 [0.44464049 0.48226426 0.51714199 0.54980012] 0.019975893536692543
3 [0.44437955 0.48217577 0.51721346 0.55002143] 0.0013112849763802009
4 [0.44439667 0.48218158 0.51720878 0.55000691] 8.603442270303673e-05
5 [0.44439555 0.4821812  0.51720908 0.55000787] 5.6449639949461415e-06
6 [0.44439562 0.48218122 0.51720906 0.5500078 ] 3.7038142862653805e-07
7 [0.44439562 0.48218122 0.51720906 0.55000781] 2.430173570284188e-08
8 [0.44439562 0.48218122 0.51720906 0.55000781] 1.5945032439464706e-09
 
21.316662663486134
[0.44439562 0.48218122 0.51720906 0.55000781]
 
21.316662663452043
[0.44439562 0.48218122 0.51720906 0.55000781]

2.4 Operation Count

LU decomposition \sim 2n^3/3 operations, this is only once
The following are done for each loop:
- Forward substitution and backward substitution costs both \sim n^2
- Norm evaluation costs \sim 2n
- The total cost \sim 2mn^2, where m is the number of loops
Without the LU decomposition overhead, the cost of the inverse-power method scales similarly to the one for the direct power method.

3 QR Method

The (direct or inverse) power method that we’ve discussed so far gives us only one eigenvalue at a time (either the largest or the smallest). As we saw, you could combine the latter method with eigenvalue shifting and then try to step through all the eigenvalues of your matrix. In the present section, we will discuss a robust and scalable method used to evaluate all the eigenvalues of a matrix at one go. The name QR method originates from the QR decomposition (or factorization).

We will also make a slight detour into similarity transformations and the related approach known as “simultaneous iteration”. We’ll try to keep the terminology straight: we use the QR decomposition in order to express a matrix as the product of two other matrices, while we use the QR method in order to evaluate all eigenvalues of a matrix。

3.1 QR decomposition

The QR decomposition starts with a matrix \boldsymbol{A} and decomposes it into the product of an orthogonal matrix \boldsymbol{Q} and an upper-triangular matrix \boldsymbol{R}. (This upper triangular matrix is called \boldsymbol{R} and not \boldsymbol{U} for historical reasons.) Symbolically, we say that any real square matrix can be factorized as \boldsymbol{A} = \boldsymbol{Q}\boldsymbol{R}.

Let me remind you that a matrix is called orthogonal if the transpose is equal to the inverse \boldsymbol{Q}^{-1} = \boldsymbol{Q}^T, or \boldsymbol{Q}\boldsymbol{Q}^T = \boldsymbol{I}. In the following, let us derive the QR decomposition.

Evaluating \boldsymbol{Q}

We start by constructing the orthogonal matrix \boldsymbol{Q}. We will employ an old method which you may have encountered in a course on linear algebra: Gram-Schmidt orthogonaliztion. Let us write our starting matrix \boldsymbol{A} in terms of its columns \boldsymbol{a}_j: \boldsymbol{A} = \left(\boldsymbol{a}_0 \quad \boldsymbol{a}_1 \quad \dots \quad \boldsymbol{a}_{n-1} \right).

Our task now is to start from assuming that the column vectors \boldsymbol{a}_j are linearly independent and try to produce an orthonormal set of column vectors \boldsymbol{q}_j. When we’ve accomplished that task, we will have already produced our orthogonal matrix \boldsymbol{Q}: \boldsymbol{Q} = \left(\boldsymbol{q}_0 \quad \boldsymbol{q}_1 \quad \dots \quad \boldsymbol{q}_{n-1} \right). since \boldsymbol{Q} will be made up of the orthonormal column vectors \boldsymbol{q}_j we just constructed.

We will build these orthonormal \boldsymbol{q}_j column vectors one at a time.

The first vector \boldsymbol{q}_0 is very easy to produce: \boldsymbol{q}_0 = \boldsymbol{a}_0 / \| \boldsymbol{a}_0 \|.
To construct \boldsymbol{q}_1, which should be orthogonal to \boldsymbol{q}_0 (or \boldsymbol{a}_0), we take the Gram-Schmidt prescription: take the second vector \boldsymbol{a}_1 and subtract out its component in the direction of \boldsymbol{q}_0: . \boldsymbol{a}_1' = \boldsymbol{a}_1 - (\boldsymbol{q}_0^T\boldsymbol{a}_1)\boldsymbol{q}_0. This is the part of \boldsymbol{a}_1 that does not point in the direction of \boldsymbol{q}_0. Then we simply perform the normalization: \boldsymbol{q}_1 = \boldsymbol{a}_1'/\|\boldsymbol{a}_1' \|.
Following this procedure, we construct the next vector \boldsymbol{q}_2, by first constructing \boldsymbol{a}_2' = \boldsymbol{a}_2 - (\boldsymbol{q}_0^T \boldsymbol{a}_2)\boldsymbol{q}_0 - (\boldsymbol{q}_1^T\boldsymbol{a}_2)\boldsymbol{q}_1, and then normalizing \boldsymbol{q}_2 = \boldsymbol{a}_2'/\|\boldsymbol{a}_2' \|.
Generally, we have for j = 0, 1, \dots, n-1 \begin{gather*} \boldsymbol{a}_j' = \boldsymbol{a}_j - \sum_{i = 0}^{j-1} (\boldsymbol{q}_i^T a_j)\boldsymbol{q}_i \\ \boldsymbol{q}_j = \boldsymbol{a}_j'/\|\boldsymbol{a}_j' \|. \end{gather*} \tag{3}

Evaluating \boldsymbol{R}

We now turn to the matrix \boldsymbol{R}. Let us assume \boldsymbol{A} = \boldsymbol{Q}\boldsymbol{R} holds, and try to determine \boldsymbol{R}. We rewritethis as \left(\boldsymbol{a}_0 \quad \boldsymbol{a}_1 \quad \dots \quad \boldsymbol{a}_{n-1} \right) = \left(\boldsymbol{q}_0 \quad \boldsymbol{q}_1 \quad \dots \quad \boldsymbol{q}_{n-1} \right) \begin{pmatrix} R_{00} & R_{01} & R_{02} &\cdots & R_{0,n-1} \\ 0 & R_{11} & R_{12} & \cdots & R_{1,n-1} \\ 0 & 0 & R_{22} & \cdots & R_{2,n-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & R_{n-1, n-1} \end{pmatrix}. We can explicitly carry out the multiplication, and find \begin{align*} \boldsymbol{a}_0 &= R_{00}\boldsymbol{q}_0 \\ \boldsymbol{a}_1 &= R_{01}\boldsymbol{q}_0 + R_{11}\boldsymbol{q}_1 \\ \boldsymbol{a}_2 &= R_{02}\boldsymbol{q}_0 + R_{12}\boldsymbol{q}_1 + \boldsymbol{R}_{22}\boldsymbol{q}_2 \\ & \vdots \\ \boldsymbol{a}_j &= R_{0j}\boldsymbol{q}_0 + R_{1j}\boldsymbol{q}_1 +\cdots + \boldsymbol{R}_{jj}\boldsymbol{q}_j \\ & \vdots \\ \boldsymbol{a}_{n-1} &= R_{0,n-1}\boldsymbol{q}_0 + R_{1,n-1}\boldsymbol{q}_1 +\cdots + \boldsymbol{R}_{n-1,n-1}\boldsymbol{q}_{n-1} \\ \end{align*}

These equations can be solved: \begin{align*} &\boldsymbol{q}_0 = \boldsymbol{a}_0/R_{00}, \quad \boldsymbol{q}_1 = \frac{\boldsymbol{a}_1 - R_{01}\boldsymbol{q}_0}{R_{11}}, \quad \boldsymbol{q}_2 = \frac{\boldsymbol{a}_2 - R_{02}\boldsymbol{q}_0 - R_{12}\boldsymbol{q}_1}{R_{22}}, \dots \\ &\boldsymbol{q}_j = \frac{\boldsymbol{a}_j - \sum_{i = 0}^{j-1}R_{ij}\boldsymbol{q}_i}{R_{jj}}, \quad \dots, \quad \boldsymbol{q}_{n-1} = \frac{\boldsymbol{a}_{n-1} - \sum_{i = 0}^{n-2} R_{i,n-1}\boldsymbol{q}_i}{R_{n-1,n-1}}. \end{align*}

Compare with Equation 3, we have \boldsymbol{R}_{ij} = \boldsymbol{q}_i^T \boldsymbol{a}_j, \quad j = 0,1,\dots,n-1, \quad i = 0,1,\dots, j-1 \\ \boldsymbol{R}_{jj} = \| \boldsymbol{a}_j' \| = \|\boldsymbol{a}_j - \sum_{i = 0}^{j-1} R_{ij}\boldsymbol{q}_i \|, \quad j = 0,1,\dots,n-1

Note that here we choose R_{jj}>0, which makes the QR decomposition uniquely determined.

Crucially, both R_{ij} and R_{jj} are quantities that we have already evaluated in the process of constructing the matrix \boldsymbol{Q}. That means we can carry out those computations in parallel, building up the matrices \boldsymbol{Q} and \boldsymbol{R} together.

3.2 QR Decomposition: Implementation

The QR decomposition is implemented in the following code.

In order to check the performance of our algorithm, we computed two quantities.

The error in QR decomposition \|\boldsymbol{A} - \boldsymbol{Q}\boldsymbol{R}\|
The orthogonality \|\boldsymbol{Q}^T\boldsymbol{Q} - \boldsymbol{I}\|

We have tried on 4\times 4, 6\times 6 and 8 \times 8 matrices. We found that the error in QR decomposition is small. However, the orthogonality is very poor. This is due to the classical Gram-Schmidt procedure behaves poorly in the presence of roundoff errors.

import numpy as np

def qrdec(A):
    n = A.shape[0]
    Ap = np.copy(A)
    Q = np.zeros((n,n))
    R = np.zeros((n,n))
    for j in range(n):
        for i in range(j):
            R[i,j] = Q[:,i]@A[:,j]
            Ap[:,j] -= R[i,j]*Q[:,i]

        R[j,j] = mag(Ap[:,j])
        Q[:,j] = Ap[:,j]/R[j,j]
    return Q, R

def testcreate(n,val):
    A = np.arange(val,val+n*n).reshape(n,n)
    A = np.sqrt(A)
    bs = (A[0,:])**2.1
    return A, bs

def mag(xs):
    return np.sqrt(np.sum(xs*xs))  
    
def testqrdec(A):
    n = A.shape[0]
    Q, R = qrdec(A)
    diffa = A - Q@R
    diffq = np.transpose(Q)@Q - np.identity(n) 
    print(n, mag(diffa), mag(diffq))

if __name__ == '__main__':
    for n in range(4,10,2):
        A, bs = testcreate(n,21)
        testqrdec(A)

3.3 Similarity transformations

We now make a quick detour to introduce another related concept. Recall that if we diagonalize a matrix \boldsymbol{A}, we manage to find the matrices \boldsymbol{V} and \boldsymbol{\Lambda} such that \boldsymbol{V}^{-1}\boldsymbol{A}\boldsymbol{V} = \boldsymbol{\Lambda} where \boldsymbol{\Lambda} contains the eigenvalues of \boldsymbol{A} and \boldsymbol{V} is made up of the eigenvectors of \boldsymbol{A}.

Assume there exists another (non-singular) matrix, \boldsymbol{S}, such that: \boldsymbol{A}' = \boldsymbol{S}^{-1} \boldsymbol{A} \boldsymbol{S}. This is known as a similarity transformation and we say \boldsymbol{A} and \boldsymbol{A}' are similar.

Now, if \boldsymbol{A}\boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i, we have \boldsymbol{S}\boldsymbol{A}'\boldsymbol{S}^{-1} \boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i. If you multiply on the left with \boldsymbol{S}^{-1} you get \boldsymbol{A}' \boldsymbol{S}^{-1} \boldsymbol{v}_i = \lambda_i \boldsymbol{S}^{-1}\boldsymbol{v}_i. If you define \boldsymbol{v}_i' \equiv \boldsymbol{S}^{-1}\boldsymbol{v}_i, we have \boldsymbol{A}'\boldsymbol{v}_i' = \lambda_i \boldsymbol{v}_i'.

We find that similar matrices have the same eigenvalues! The eigenvectors are then related by similarity transformations.

As a special case, if \boldsymbol{S} is orthogonal (or unitary), we then say \boldsymbol{A} and \boldsymbol{A}' are orthogonally (or unitarily) similar. Let us stay with real matrices, so the orthogonality gives \boldsymbol{A}' = \boldsymbol{Q}^T \boldsymbol{A} \boldsymbol{Q}.

Since similarity transformation has the property that preserves the eigenvalues, we can then read off the eigenvalues along the diagonal of \boldsymbol{A}' (or \boldsymbol{A}) if we can find a \boldsymbol{Q} such that \boldsymbol{A}' is triangular.

3.4 Simultaneous Iteration: First Attempt

The method of simultaneous iteration is a generalization of the power method to more than one eigenvectors. We assume that our eigenvalues are distinct so that they can be sorted. For the power method, we started with a vector \boldsymbol{z}^{(0)} and then multiplied with \boldsymbol{A} repeatedly. If we expand \boldsymbol{z}^{(0)} by the eigenvectors, then we have \boldsymbol{z}^{(k)} = \sum_{i = 0}^{n-1} c_i \lambda_i^k \boldsymbol{v}_i.

We can generalize this approach to the case of more eigenvectors. Let us construct the initial guess \boldsymbol{Z}^{(0)} = \left(\boldsymbol{z}_{0}^{(0)} \quad \boldsymbol{z}_{1}^{(0)} \quad \cdots \quad \boldsymbol{z}_{n-1}^{(0)} \right). For example, one can take \boldsymbol{Z}^{0} = \boldsymbol{I}. We can compute \boldsymbol{Z}^{(k)} = \boldsymbol{A}^{k}\boldsymbol{Z}^{(0)} = \left(\boldsymbol{z}_0^{(k)} \quad \boldsymbol{z}_1^{(k)} \quad \boldsymbol{z}_2^{(k)} \quad \cdots \quad \boldsymbol{z}_{n-1}^{(k)} \right).

If we simply makes k larger and larger, all \boldsymbol{z}_{j}^{(k)} will converge to \boldsymbol{v}_0. This is disappointing, even if we normalize these vectors \boldsymbol{z}_j^{(k)} at each step.

To solve this issue, we can need to introduce orthogonalization procedures for the vectors in \boldsymbol{Z}^{(k)}, via the Gram-Schmidt or equivalently the QR decomposition, described in the following.

3.5 Simultaneous Iteration: Orthonormalizing

Upon closer inspection, we realize what’s going on: since we are now dealing with more than one eigenvector, normalizing columns is not enough: what we need to do, instead, is to ensure that the dependence of one column on any of the other columns is projected out. That is, in addition to normalizing, we also need to orthogonalize. We can ensure that we are dealing with orthonormal vectors by carrying out a QR decomposition at each step.

We’ll proceed to give the prescription for the simultaneous iteration method (also known as orthogonal iteration) and later explore some of its fascinating properties. Let us consider \begin{gather*} \boldsymbol{Z}^{(k)} = \boldsymbol{A} \boldsymbol{Q}^{(k-1)} \\ \boldsymbol{Z}^{(k)} = \boldsymbol{Q}^{(k)} \boldsymbol{R}^{(k)}, \end{gather*} where we start with \boldsymbol{Q}^{(0)} = \boldsymbol{I}.

If we multiply from right \boldsymbol{R}^{(k-1)} on both sides in the first of the above equations, we get \boldsymbol{Z}^{(k)}\boldsymbol{R}^{(k-1)} = \boldsymbol{A}\boldsymbol{Q}^{(k-1)}\boldsymbol{R}^{(k-1)} = \boldsymbol{A}\boldsymbol{Z}^{(k-1)}. Multiplying on the right \boldsymbol{R}^{(k-2)} on both sides, we then obtain \boldsymbol{Z}^{(k)}\boldsymbol{R}^{(k-1)}\boldsymbol{R}^{(k-2)} = \boldsymbol{A}\boldsymbol{Z}^{(k-1)}\boldsymbol{R}^{(k-2)} = \boldsymbol{A}^2\boldsymbol{Z}^{(k-2)}.

We can continue this procedure, and finally arrive at \boldsymbol{Z}^{(k)}\boldsymbol{R}^{(k-1)}\boldsymbol{R}^{(k-2)}\cdots\boldsymbol{R}^{(1)} = \boldsymbol{A}^{k-1}\boldsymbol{Z}^{(1)}.

On the other hand, we have \boldsymbol{Z}^{(1)} = \boldsymbol{A}\boldsymbol{Q}^{(0)} = \boldsymbol{A}, which combined with the previous equation gives \boldsymbol{Z}^{(k)}\boldsymbol{R}^{(k-1)}\boldsymbol{R}^{(k-2)}\cdots\boldsymbol{R}^{(1)} = \boldsymbol{A}^k. Note that \boldsymbol{Z}^{(k)} = \boldsymbol{Q}^{(k)}\boldsymbol{R}^{(k)}, we can rewrite the left-hand side of the above equation and obtain \boldsymbol{A}^k = \boldsymbol{Q}^{(k)} \boldsymbol{R}^{(k)}\boldsymbol{R}^{(k-1)}\cdots\boldsymbol{R}^{(1)} = \boldsymbol{Q}^{(k)}\mathcal{R}^{(k)}, where \mathcal{R}^{(k)} = \prod_{j=k}^{1}\boldsymbol{R}^{(j)} is also upper triangular, since it is a product of upper triangular matrices.

Thus, this allow us to construct orthonormal bases for the k-th power of \boldsymbol{A}. For symmetric \boldsymbol{A}, \boldsymbol{Q}^{(k)} converges to the eigenvectors. For a nonsymmetric \boldsymbol{A}, \boldsymbol{Q}^{(k)} converges toward the orthogonal “factor” of the eigenvector matrix. Namely, \boldsymbol{Q}^{(k)} \to \tilde{\boldsymbol{Q}}, such that the eigenvectors \boldsymbol{V} = \tilde{\boldsymbol{Q}}\boldsymbol{U} for some upper triangular matrix \boldsymbol{U}. (We state this without a proof.) In either case, \boldsymbol{Q}^{(k)} is related to the eigenvectors of the matrix \boldsymbol{A}.

After obtaining \boldsymbol{Q}^{(k)}, we compute \boldsymbol{A}^{(k)} = [\boldsymbol{Q}^{(k)}]^T\boldsymbol{A}\boldsymbol{Q}^{(k)}, \tag{4} which is a generalization of the Rayleigh quotient.

Here \boldsymbol{A}^{(k)} is orthogonally similar to \boldsymbol{A} and converges to an upper-triangular matrix for general matrix \boldsymbol{A} as k grows. This can be seen as \lim_{k\to\infty} \boldsymbol{A}^{(k)} = [\tilde{\boldsymbol{Q}}]^T\boldsymbol{A}\tilde{\boldsymbol{Q}} = [\tilde{\boldsymbol{Q}}]^T\boldsymbol{A}\boldsymbol{V}\boldsymbol{U}^{-1} = [\tilde{\boldsymbol{Q}}]^T\boldsymbol{V}\boldsymbol{\Lambda}\boldsymbol{U}^{-1} = [\tilde{\boldsymbol{Q}}]^T\tilde{\boldsymbol{Q}}\boldsymbol{U}\boldsymbol{\Lambda}\boldsymbol{U}^{-1} = \boldsymbol{U}\boldsymbol{\Lambda}\boldsymbol{U}^{-1}. Note that \boldsymbol{U} is upper-triangular and \boldsymbol{\Lambda} is diagonal, we see \boldsymbol{A}^{(k)} becomes upper triangular. In case \boldsymbol{A} symmetric, then \boldsymbol{A}^{(k)} will become diagonal in particular.

3.6 QR Method: Algorithm

Let us simply state the QR method and then show it is equivalent to the simultaneous iteration method introduced above. We start with \boldsymbol{A}^{0} = \boldsymbol{A}, and construct iteratively \boldsymbol{A}^{(k-1)} = \boldsymbol{Q}^{(k)}\boldsymbol{R}^{(k)}, \quad \boldsymbol{A}^{(k)} = \boldsymbol{R}^{(k)} \boldsymbol{Q}^{(k)}.

To see the equivalence to the method introduced in the previous section, let us write \boldsymbol{A}^{(k)} = \boldsymbol{R}^{(k)} \boldsymbol{A}^{(k-1)}[\boldsymbol{R}^{(k)}]^{-1} = \dots = \mathcal{R}^{(k)}\boldsymbol{A}[\mathcal{R}^{(k)}]^{-1}, where \mathcal{R}^{(k)} = \boldsymbol{R}^{(k)}\boldsymbol{R}^{(k-1)}\dots\boldsymbol{R}^{(1)}.

On the other hand, from the previous section, we learned that \boldsymbol{A}^k = \boldsymbol{Q}^{(k)}\mathcal{R}^{(k)} \quad \Rightarrow \boldsymbol{A}^k[\mathcal{R}^{(k)}]^{-1} = \boldsymbol{Q}^{(k)}. This means we obtain \boldsymbol{A}^{(k)} = \mathcal{R}^{(k)}\boldsymbol{A}^{-k}\boldsymbol{A}\boldsymbol{A}^{k}[\mathcal{R}^{(k)}]^{-1} =[\boldsymbol{Q}^{(k)}]^T\boldsymbol{A}\boldsymbol{Q}^{(k)}. This is nothing but Equation 4! This shows the equivalence between the QR-method and the simultaneous iteration method.

3.7 QR method: Implementation

See the following code.

import numpy as np

def qrdec(A):
    n = A.shape[0]
    Ap = np.copy(A)
    Q = np.zeros((n,n))
    R = np.zeros((n,n))
    for j in range(n):
        for i in range(j):
            R[i,j] = Q[:,i]@A[:,j]
            Ap[:,j] -= R[i,j]*Q[:,i]

        R[j,j] = mag(Ap[:,j])
        Q[:,j] = Ap[:,j]/R[j,j]
    return Q, R

def testcreate(n,val):
    A = np.arange(val,val+n*n).reshape(n,n)
    A = np.sqrt(A)
    bs = (A[0,:])**2.1
    return A, bs
    

def mag(xs):
    return np.sqrt(np.sum(xs*xs))  

def qrmet(inA,kmax=100):
    A = np.copy(inA)
    for k in range(1,kmax):
        Q, R = qrdec(A)
        A = R@Q
        print(k, np.diag(A))

    qreigvals = np.diag(A)
    return qreigvals

if __name__ == '__main__':
    A, bs = testcreate(4,21)
    qreigvals = qrmet(A,6)
    print(" ")
    npeigvals, npeigvecs = np.linalg.eig(A); print(npeigvals)

3.8 QR Method: Operation Count

To compute the operation count required for the full QR method, we need to first count the operation cost of the QR decomposition prescription and then of the QR method itself.

We will here only look at the leading contribution.

FIrst, we focus on the QR decomposition, via the Gram-Schmidt procedure. For fixed i,j, we have to compute the inner product R_{ij} = (\boldsymbol{q}_i^T\boldsymbol{a}_j), which involves n multiplication and (n-1) addition. Then, to compute \boldsymbol{a}_j' = \boldsymbol{a}_j - \sum_{i=0}^{j-1}R_{ij}\boldsymbol{q}_i, we need another n multiplication and n subtraction. In total, for fixed i,j, we have \sim 4n flops.

Let us now sum over i,j, \sum_{j=0}^{n-1}\sum_{i=0}^{j-1} 4n = \sum_{j=0}^{n-1}4nj = 4n\frac{n(n-1)}{2}\sim 2n^3.

We see the QR decomposition costs \sim 2n^3 flops, larger than LU decomposition (\sim 2n^3/3).

For the QR method, we have to perform additional matrix multiplication. For each matrix multiplication, we have \sim 2n^3 cost. Thus, if we have m iterations, the total cost will be \sim 4mn^3, where half comes from the QR decomposition and the other half comes from the multiplication.

4 All Eigenvalues and Eigenvectors

After finding all eigenvalues, we can use the inverse power method with shift to find out each eigenvectors. This is implemented in the following code.

import numpy as np

def qrdec(A):
    n = A.shape[0]
    Ap = np.copy(A)
    Q = np.zeros((n,n))
    R = np.zeros((n,n))
    for j in range(n):
        for i in range(j):
            R[i,j] = Q[:,i]@A[:,j]
            Ap[:,j] -= R[i,j]*Q[:,i]

        R[j,j] = mag(Ap[:,j])
        Q[:,j] = Ap[:,j]/R[j,j]
    return Q, R

def testcreate(n,val):
    A = np.arange(val,val+n*n).reshape(n,n)
    A = np.sqrt(A)
    bs = (A[0,:])**2.1
    return A, bs
    
def ludec(A):
    n = A.shape[0]
    U = np.copy(A)
    L = np.identity(n)

    for j in range(n-1):
        for i in range(j+1,n):
            coeff = U[i,j]/U[j,j]
            U[i,j:] -= coeff*U[j,j:]
            L[i,j] = coeff
    return L, U

def mag(xs):
    return np.sqrt(np.sum(xs*xs))  

def termcrit(xolds,xnews):
    errs = np.abs((xnews - xolds)/xnews)
    return np.sum(errs)

def mag(xs):
    return np.sqrt(np.sum(xs*xs))  

#def invpowershift(A,shift=20,kmax=200,tol=1.e-2):
def invpowershift(A,shift=20,kmax=200,tol=1.e-8):
    n = A.shape[0]
    znews = np.ones(n)
    qnews = znews/mag(znews)
    Astar = A - np.identity(n)*shift
    L, U = ludec(Astar)

    for k in range(1,kmax):
        qs = np.copy(qnews)
        ys = forsub(L,qs)
        znews = backsub(U,ys)
        qnews = znews/mag(znews)

        if qs@qnews<0:
            qnews = -qnews

        err = termcrit(qs,qnews)
        print(k, qnews, err)

        if err < tol:
            lam = qnews@A@qnews
            break
    else:
        lam = qnews = None

    return lam, qnews
    
def forsub(L,bs):
    n = bs.size
    xs = np.zeros(n)
    for i in range(n):
        xs[i] = (bs[i] - L[i,:i]@xs[:i])/L[i,i]
    return xs

def backsub(U,bs):
    n = bs.size
    xs = np.zeros(n)
    for i in reversed(range(n)):
        xs[i] = (bs[i] - U[i,i+1:]@xs[i+1:])/U[i,i]
    return xs

def eig(A,eps=1.e-12):
    n = A.shape[0]
    eigvals = np.zeros(n)
    eigvecs = np.zeros((n,n))
    qreigvals = qrmet(A)
    for i, qre in enumerate(qreigvals):
        eigvals[i], eigvecs[:,i] = invpowershift(A,qre+eps)
    return eigvals, eigvecs

def testeigall(f,A):
    eigvals, eigvecs = f(A)
    npeigvals, npeigvecs = np.linalg.eig(A)
    print(eigvals); print(npeigvals)
    print(" ")
    for eigvec, npeigvec in zip(eigvecs.T,npeigvecs.T):
        print(eigvec); print(npeigvec)
        print(" ")
    
if __name__ == '__main__':
    A, bs = testcreate(4,21)
    testeigall(eig,A)

5 Homeworks

Modify the code of the “Power Method” by implementing the termination criterion in Equation 2.
Proof mathematically that product of upper-triangular matrices is still an upper-triangular matricx (without programming).