The Making of a Preconditioner

by Abhilash Reddy M

This notebook ultimately demonstrates a multigrid preconditioned Krylov solver in python3. The code and more examples are present on github at GeometricMultigrid. The problem solved is the Poisson equation on a rectangular domain with homogenous dirichlet boundary conditions. Finite difference with cell-centered discretization is used to get a second order accurate solution. First, the V-cycle is explained, followed by Full-Multigrid and finally a demonstration of a multigrid (V-cycle) preconditioned Krylov solver.

Multigrid algorithm

We need some terminology before going further.

  • Approximation
  • Residual
  • Exact solution
  • Correction

Let $A u = f $ be the system of linear equations with an exact solution $u_{ex}$, and let $u_0$ be an approximation to $u_{ex}$. Then the error is $e=u_{ex}-u_0$ . Multiplying this by $A$ we get $A e = A u_{ex} - A u_0$. Obviously, $A u_{ex} =f $, which means $A e = f - A u_0 $. The quantity $f - A u_0 =r $ is called the residual and $A e = r$ is called the residual equation. If we solve this problem, i.e., if we find $e$, the solution to the original problem is $(u_0+e)$. The quantity $e$ is also called "correction" because it "corrects" the initial approximation to give a better approximation. For a linear system of equations, solving the residual equation is equivalent to solving the original problem.

What is described here is a basic geometric multigrid algorithm, where a series of nested grids are used. There are four parts to this multigrid algorithm

  • Smoothing Operator (a.k.a Relaxation)
  • Restriction Operator
  • Interpolation Operator (a.k.a Prolongation Operator)
  • Bottom solver

All of these are defined below. To begin import numpy.

In [1]:
import numpy as np

Smoothing operator

As the name suggests this is any procedure that smooths the input function. The simplest smoothers are a certain number of Jacobi or a Gauss-Seidel iterations. Below is defined a smoother that uses Gauss Seidel sweeps and returns the result of the smoothing along with the residual.

In [2]:
def GSrelax(nx,ny,u,f,iters=1):
  '''
  Gauss Seidel smoothing
  '''
  
  dx=1.0/nx
  dy=1.0/ny

  Ax=1.0/dx**2
  Ay=1.0/dy**2
  Ap=1.0/(2.0*(Ax+Ay))

  #BCs. Homogeneous Dirichlet BCs
  u[ 0,:] = -u[ 1,:]
  u[-1,:] = -u[-2,:]
  u[:, 0] = -u[:, 1]
  u[:,-1] = -u[:,-2]

  for it in range(iters):
    for i in range(1,nx+1):
     for j in range(1,ny+1):
         u[i,j]= Ap*( Ax*(u[i+1,j]+u[i-1,j])
                     +Ay*(u[i,j+1]+u[i,j-1]) - f[i,j])
    #BCs. Homogeneous Dirichlet BCs
    u[ 0,:] = -u[ 1,:]
    u[-1,:] = -u[-2,:]
    u[:, 0] = -u[:, 1]
    u[:,-1] = -u[:,-2]

  #calculate the residual
  res=np.zeros([nx+2,ny+2])
  for i in range(1,nx+1):
    for j in range(1,ny+1):
      res[i,j]=f[i,j] - ((Ax*(u[i+1,j]+u[i-1,j])+Ay*(u[i,j+1]+u[i,j-1]) - 2.0*(Ax+Ay)*u[i,j]))
  return u,res

Interpolation Operator

This operator takes values on a coarse grid and transfers them onto a fine grid. It is also called prolongation. The function below uses bilinear interpolation for this purpose. 'v' is on a coarse grid and we want to interpolate it on a fine grid and store it in v_f.

In [3]:
def prolong(nx,ny,v):
  '''
  interpolate 'v' to the fine grid
  '''
  v_f=np.zeros([2*nx+2,2*ny+2])

  for i in range(1,nx+1):
    for j in range(1,ny+1):
      v_f[2*i-1,2*j-1] = 0.5625*v[i,j]+0.1875*(v[i-1,j]+v[i,j-1])+0.0625*v[i-1,j-1]
      v_f[2*i  ,2*j-1] = 0.5625*v[i,j]+0.1875*(v[i+1,j]+v[i,j-1])+0.0625*v[i+1,j-1]
      v_f[2*i-1,2*j  ] = 0.5625*v[i,j]+0.1875*(v[i-1,j]+v[i,j+1])+0.0625*v[i-1,j+1]
      v_f[2*i  ,2*j  ] = 0.5625*v[i,j]+0.1875*(v[i+1,j]+v[i,j+1])+0.0625*v[i+1,j+1]
  return v_f

Restriction

This is exactly the opposite of the interpolation. It takes values from the find grid and transfers them onto the coarse grid. It is an averaging process. This is fundamentally different from interpolation. It is like "lumping". Each coarse grid point is surrounded by four fine grid points. So quite simply we take the value of the coarse point to be the average of 4 fine points. Here 'v' is the fine grid quantity and 'v_c' is the coarse grid quantity

In [4]:
def restrict(nx,ny,v):
  '''
  restrict 'v' to the coarser grid
  '''
  v_c=np.zeros([nx+2,ny+2])
  
  for i in range(1,nx+1):
    for j in range(1,ny+1):
      v_c[i,j]=0.25*(v[2*i-1,2*j-1]+v[2*i,2*j-1]+v[2*i-1,2*j]+v[2*i,2*j])
  return v_c

Note that we have looped over the coarse grid in both the cases above. It is easier to access the variables this way.

Bottom Solver

This comes into picture at the coarsest level, at the bottom of a V-cycle. At this level, an exact solution(to a particular derived linear system) is needed. This must be something that gives us the exact/converged solution. The number of unknowns is very small at this level(e.g 2x2=4 or 1x1=1 ) and the linear system can be solved exactly (to roundoff) by the smoother itself with few iterations. In this notebook we use 50 iterations are used here of our smoother. If we coarsify to just one point, then just one iteration will solve it exactly.

The quantities approximation, error, residual are used frequently and are essential to understanding the multigrid algorithm. Typically, the actual solution solution variable 'u' is present only on the finest grid. It is only used on the finest grid. The unknowns on the coarse grid are something different (the error). The information that goes from a fine to coarse grid is the residual. The information that goes from a coarse grid to a fine grid is correction or error, often called "Coarse-grid correction".

V-cycle

Now that we have all the parts, we are ready to build our multigrid algorithm. We will look at a V-cycle. the function is self explanatory. It is a recursive function,i.e., it calls itself. It takes as input an initial guess 'u', the rhs 'f', the number of multigrid levels 'num_levels' among other things. At each level the V cycle calls another V-cycle. At the lowest level the solving is exact. (If this is too complex, I would suggest starting with a two-grid scheme and then extending it to V-cycle.)

In [5]:
def V_cycle(nx,ny,num_levels,u,f,level=1):
  '''
  V cycle
  '''
  if(level==num_levels):#bottom solve
    u,res=GSrelax(nx,ny,u,f,iters=50)
    return u,res

  #Step 1: Relax Au=f on this grid
  u,res=GSrelax(nx,ny,u,f,iters=1)

  #Step 2: Restrict residual to coarse grid
  res_c=restrict(nx//2,ny//2,res)

  #Step 3:Solve A e_c=res_c on the coarse grid. (Recursively)
  e_c=np.zeros_like(res_c)
  e_c,res_c=V_cycle(nx//2,ny//2,num_levels,e_c,res_c,level+1)

  #Step 4: Interpolate(prolong) e_c to fine grid and add to u
  u+=prolong(nx//2,ny//2,e_c)
  
  #Step 5: Relax Au=f on this grid
  u,res=GSrelax(nx,ny,u,f,iters=1)
  return u,res

The first part of the function only kicks in at the coarsest level, where an exact solution is required. So we ignore it initially.

Step 1: Apply smoothing to $Au=f$. With the u and f that is specified in the arguments. The matrix $A$ is the discretized poisson matrix at this level of discretization with the right BCs. An approximation and a residual are obtained here.

Step 2: The residual is transferred(restricted) to the next coarser grid.

Step 3: The residual equation $ A e = r $ is solved on the next coarse grid, by calling V_cycle() with the guessed solution(zeros) and the RHS that was obtained by restriction of residual. Here is where the resursion happens.

Step 4: The error or correction obtained by solving the residual equation on the coarser grid is interpolated and added to the approximation on this grid. This is the new approximation on this grid.

Step 5: This new approximation is smoothed on this grid. and the solution and residual are returned

Thats it! Now we can see it in action. We can use a problem with a known solution to test our code. The following functions set up a rhs for a problem with homogenous dirichlet BC on the unit square.

In [6]:
#analytical solution
def Uann(x,y):
   return (x**3-x)*(y**3-y)
#RHS corresponding to above
def source(x,y):
  return 6*x*y*(x**2+ y**2 - 2)

Let us set up the problem, discretization and solver details. The number of divisions along each dimension is given as a power of two function of the number of levels. In principle this is not required, but having it makes the inter-grid transfers easy. The coarsest problem is going to have a 2-by-2 grid.

In [7]:
#input
max_cycles = 18
nlevels    = 6  
NX         = 2*2**(nlevels-1)
NY         = 2*2**(nlevels-1)
tol        = 1e-12      
In [8]:
#the grid has one layer of ghost cellss
uann=np.zeros([NX+2,NY+2])#analytical solution
u   =np.zeros([NX+2,NY+2])#approximation
f   =np.zeros([NX+2,NY+2])#RHS

#calcualte the RHS and exact solution
DX=1.0/NX
DY=1.0/NY

xc=np.linspace(0.5*DX,1-0.5*DX,NX)
yc=np.linspace(0.5*DY,1-0.5*DY,NY)
XX,YY=np.meshgrid(xc,yc,indexing='ij')

uann[1:NX+1,1:NY+1]=Uann(XX,YY)
f[1:NX+1,1:NY+1]   =source(XX,YY)

Now we can call the solver

In [9]:
print('mgd2d.py solver:')
print('NX:',NX,', NY:',NY,', tol:',tol,'levels: ',nlevels)
for it in range(1,max_cycles+1):
  u,res=V_cycle(NX,NY,nlevels,u,f)
  rtol=np.max(np.max(np.abs(res)))
  if(rtol<tol):
    break
  error=uann[1:NX+1,1:NY+1]-u[1:NX+1,1:NY+1]
  print('  cycle: ',it,', L_inf(res.)= ',rtol,',L_inf(true error): ',np.max(np.max(np.abs(error))))

error=uann[1:NX+1,1:NY+1]-u[1:NX+1,1:NY+1]
print('L_inf (true error): ',np.max(np.max(np.abs(error))))
mgd2d.py solver:
NX: 64 , NY: 64 , tol: 1e-12 levels:  6
  cycle:  1 , L_inf(res.)=  0.642629986844 ,L_inf(true error):  0.0317166429701
  cycle:  2 , L_inf(res.)=  0.123076148245 ,L_inf(true error):  0.0065444191256
  cycle:  3 , L_inf(res.)=  0.0285399250988 ,L_inf(true error):  0.00129511576103
  cycle:  4 , L_inf(res.)=  0.00609759348049 ,L_inf(true error):  0.000235909148521
  cycle:  5 , L_inf(res.)=  0.00122382928112 ,L_inf(true error):  6.8460206701e-05
  cycle:  6 , L_inf(res.)=  0.000240202872419 ,L_inf(true error):  6.91074731238e-05
  cycle:  7 , L_inf(res.)=  4.75025444757e-05 ,L_inf(true error):  6.9208166063e-05
  cycle:  8 , L_inf(res.)=  9.33801720748e-06 ,L_inf(true error):  6.92235458747e-05
  cycle:  9 , L_inf(res.)=  1.79272751666e-06 ,L_inf(true error):  6.92258641371e-05
  cycle:  10 , L_inf(res.)=  3.36130483447e-07 ,L_inf(true error):  6.922621108e-05
  cycle:  11 , L_inf(res.)=  6.28469933872e-08 ,L_inf(true error):  6.92262629658e-05
  cycle:  12 , L_inf(res.)=  1.16460796562e-08 ,L_inf(true error):  6.92262707657e-05
  cycle:  13 , L_inf(res.)=  2.17323758989e-09 ,L_inf(true error):  6.92262719492e-05
  cycle:  14 , L_inf(res.)=  4.19959178544e-10 ,L_inf(true error):  6.92262721307e-05
  cycle:  15 , L_inf(res.)=  8.03197508503e-11 ,L_inf(true error):  6.92262721588e-05
  cycle:  16 , L_inf(res.)=  1.51771928358e-11 ,L_inf(true error):  6.92262721632e-05
  cycle:  17 , L_inf(res.)=  2.84217094304e-12 ,L_inf(true error):  6.92262721638e-05
L_inf (true error):  6.92262721639e-05

True error is the difference of the approximation with the analytical solution. It is largely the discretization error. This what would be present when we solve the discrete equation with a direct/exact method like gaussian elimination. We see that true error stops reducing at the 5th cycle. The approximation is not getting any better after this point. So we can stop after 5 cycles. But, in general we dont know the true error. In practice we use the norm of the (relative) residual as a stopping criterion. As the cycles progress the floating point round-off error limit is reached and the residual also stops decreasing.

This was the multigrid V cycle. We can use this as preconditioner to a Krylov solver. But before we get to that let's complete the multigrid introduction by looking at the Full Multi-Grid algorithm. You can skip this section safely.

Full Multi-Grid

We started with a zero initial guess for the V-cycle. Presumably, if we had a better initial guess we would get better results. So we might solve a coarse problem exactly (recursively with FMG) and interpolate it onto the fine grid and use that as the initial guess for the V-cycle. The result of doing this recursively is the Full Multi-Grid(FMG) Algorithm. Unlike the V-cycle which was an iterative procedure, FMG is a direct solver. There is no successive improvement of the approximation. It straight away gives us an result that is within the discretization error. The FMG algorithm is given below.

In [10]:
def FMG(nx,ny,num_levels,f,nv=1,level=1):

  if(level==num_levels):#bottom solve
    u=np.zeros([nx+2,ny+2])  
    u,res=GSrelax(nx,ny,u,f,iters=50)
    return u,res

  #Step 1: Restrict the rhs to a coarse grid
  f_c=restrict(nx//2,ny//2,f)

  #Step 2: Solve the coarse grid problem using FMG
  u_c,_=FMG(nx//2,ny//2,num_levels,f_c,nv,level+1)

  #Step 3: Interpolate u_c to the fine grid
  u=prolong(nx//2,ny//2,u_c)

  #step 4: Execute 'nv' V-cycles
  for _ in range(nv):
    u,res=V_cycle(nx,ny,num_levels-level,u,f)
  return u,res

Lets call the FMG solver for the same problem

In [11]:
print('mgd2d.py FMG solver:')
print('NX:',NX,', NY:',NY,', levels: ',nlevels)

u,res=FMG(NX,NY,nlevels,f,nv=1) 
rtol=np.max(np.max(np.abs(res)))

print(' FMG L_inf(res.)= ',rtol)
error=uann[1:NX+1,1:NY+1]-u[1:NX+1,1:NY+1]
print('L_inf (true error): ',np.max(np.max(np.abs(error))))
mgd2d.py FMG solver:
NX: 64 , NY: 64 , levels:  6
 FMG L_inf(res.)=  0.00520405221036
L_inf (true error):  6.64976295283e-05

And... It works! The residual is large but the true error is within the discretization level. FMG is said to be scalable because the amount of work needed is linearly proportional to the the size of the problem. In big-O notation, FMG is $\mathcal{O}(N)$. Where N is the number of unknowns. Exact methods (Gaussian Elimination, LU decomposition ) are $\mathcal{O}(N^3)$

Stationary iterative methods as preconditioners

A preconditioner is matrix is an easily invertible approximation of the coefficient matrix. We dont explicitly need a matrix because we dont access the elements by index, coefficient matrix or preconditioner. What we do need is the action of the matrix on a vector. That is we need the matrix-vector product only. The coefficient matrix can be defined as a function that takes in a vector and returns the matrix vector product.

Now, a stationary iterative method for solving an equation can be written as a Richardson iteration. When the initial guess is set to zero and one iteration is performed, what you get is the action of the preconditioner on the RHS vector. That is, we get a preconditioner-vector product, which is what we want.

This allows us to use any blackbox function stationary iterative method as a preconditioner

We can use the multigrid V-cycle as a preconditioner this way. We cant use FMG because it is not an iterative method.

The matrix as a function can be defined using LinearOperator from scipy.sparse.linalg. It gives us an object which works like a matrix in-so-far as the product with a vector is concerned. It can be used as a regular 2D numpy array in multiplication with a vector. This can be passed to GMRES() or BiCGStab() as a preconditioner.

Having a symmetric preconditioner would be nice because it will retain the symmetry if the original problem is symmetric. The multigrid V-cycle above is not symmetric because the Gauss-Seidel preconditioner is unsymmetric. If we were to use jacobi method, or symmetric Gauss-Seidel (SGS) method, then symmetry would be retained. As such Conjugate Gradient method will not work here becuase our preconditioner is not symmetric. It is possible to keep the symmetry intact when using Gauss-Seidel relaxation if the ordering (order of evaluation in GS) is opposite in the pre and post smoothing sweeps.

Below is the code for defining a V-Cycle preconditioner. It returns a (scipy.sparse.linalg.) LinearOperator that can be passed to Krylov solvers as a preconditioner

In [16]:
from scipy.sparse.linalg import LinearOperator,bicgstab
def MGVP(nx,ny,num_levels):
  def pc_fn(v):
    u =np.zeros([nx+2,ny+2])
    f =np.zeros([nx+2,ny+2])
    f[1:nx+1,1:ny+1] =v.reshape([nx,ny])
    #perform one V cycle
    u,res=V_cycle(nx,ny,num_levels,u,f)
    return u[1:nx+1,1:ny+1].reshape(v.shape)
  M=LinearOperator((nx*ny,nx*ny), matvec=pc_fn)
  return M

Let us define the Poisson matrix also as a Linear Operator

In [17]:
def Laplace(nx,ny):
  def mv(v):
    u =np.zeros([nx+2,ny+2])
    ut=np.zeros([nx,ny])
    u[1:nx+1,1:ny+1]=v.reshape([nx,ny])

    dx=1.0/nx; dy=1.0/ny
  
    Ax=1.0/dx**2;Ay=1.0/dy**2
  
    #BCs. Homogenous Dirichlet
    u[ 0,:] = -u[ 1,:]
    u[-1,:] = -u[-2,:]
    u[:, 0] = -u[:, 1]
    u[:,-1] = -u[:,-2]
  
    for i in range(1,nx+1):
      for j in range(1,ny+1):
        ut[i-1,j-1]=(Ax*(u[i+1,j]+u[i-1,j])+Ay*(u[i,j+1]+u[i,j-1]) - 2.0*(Ax+Ay)*u[i,j])
    return ut.reshape(v.shape)
  return mv

The nested function is required because "matvec" in LinearOperator takes only one argument-- the vector. But we require the grid details and boundary condition information to create the Poisson matrix. Now will use these to solve a problem. Unlike earlier where we used an analytical solution and RHS, we will start with a random vector which will be our exact solution, and multiply it with the Poisson matrix to get the RHS vector for the problem. There is no analytical equation associated with the matrix equation

The scipy sparse solve routines do not return the number of iterations performed. We can use this wrapper to get the number of iterations

In [14]:
def solve_sparse(solver,A, b,tol=1e-10,maxiter=500,M=None):
      num_iters = 0
      def callback(xk):
         nonlocal num_iters
         num_iters+=1
      x,status=solver(A, b,tol=tol,maxiter=maxiter,callback=callback,M=M)
      return x,status,num_iters

Lets look at what happens with and without the preconditioner.

In [18]:
A = LinearOperator((NX*NY,NX*NY), matvec=Laplace(NX,NY))
#Exact solution and RHS
uex=np.random.rand(NX*NY,1)
b=A*uex

#Multigrid Preconditioner
M=MGVP(NX,NY,nlevels)

u,info,iters=solve_sparse(bicgstab,A,b,tol=1e-10,maxiter=500)
print('without preconditioning. status:',info,', Iters: ',iters)
error=uex.reshape([NX,NY])-u.reshape([NX,NY])
print('error :',np.max(np.max(np.abs(error))))

u,info,iters=solve_sparse(bicgstab,A,b,tol=1e-10,maxiter=500,M=M)
print('With preconditioning. status:',info,', Iters: ',iters)

error=uex.reshape([NX,NY])-u.reshape([NX,NY])
print('error :',np.max(np.max(np.abs(error))))
without preconditioning. status: 0 , Iters:  146
error : 1.62363409384e-08
With preconditioning. status: 0 , Iters:  7
error : 4.48214265703e-09

Without the preconditioner ~150 iterations were needed, where as with the V-cycle preconditioner the solution was obtained in far fewer iterations. This is not to say that the preconditioning gives a faster solution, because the cost of the preconditioner also has to be taken into account. But generally it is the case that the correct preconditioning significantly reduces the effort needed to get a solution, and thus is faster. Additionally, an iterative procedure may not even converge in the absense of a preconditioner.

So, there we have it. A Multigrid Preconditioned Krylov Solver.

The problem considered was a very simple one. With homogeneous dirichlet BCs we have same same BCs at all grid levels. For general BCs some additional considerations are required. Even when there are non homogeneous dirichlet BCs, on the coarse grid we still get homogeneous BCs because on the coarse grid we are solving the residual equation.

The github repo, GeometricMultigrid, contains the 3D version of this multigrid algorithm, along with some other examples. Here, readability of the code has been prioritized over performance. By using numpy array operations, the performance can be significantly improved and large ($\sim128^3$) problems can be solved quickly on a personal computer. This is shown in the github repo as well.