|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| {{redirect|Preconditioning}}
| | Hi there. Let me start by introducing the author, her title is Sophia Boon but she by no means really favored that title. To perform lacross is something he would never give up. I've usually loved residing in Kentucky but now I'm considering other options. She works as a journey agent but soon she'll be on her own.<br><br>Here is my web-site ... online psychic chat ([http://kard.dk/?p=24252 click here.]) |
| {{no footnotes|date=February 2013}}
| |
| In [[mathematics]], '''preconditioning''' is a procedure of an application of a transformation, called the '''preconditioner''', that conditions a given problem into a form that is more suitable for numerical solution. Preconditioning is typically related to reducing a [[condition number]] of the problem. The preconditioned problem is then usually solved by an [[iterative method]].
| |
| | |
| == Preconditioning for linear systems ==
| |
| | |
| In [[linear algebra]] and [[numerical analysis]], a '''preconditioner''' <math>P</math> of a matrix <math>A</math> is a matrix such that <math> P^{-1}A</math> has a smaller [[condition number]] than <math>A</math>. It is also common to call <math>T=P^{-1}</math> the preconditioner, rather than <math>P</math>, since <math>P</math> itself is rarely explicitly available. In modern preconditioning, the application of <math>T=P^{-1}</math>, i.e., multiplication of a column vector, or a block of column vectors, by <math>T=P^{-1}</math>, is commonly performed by rather sophisticated computer software packages in a [[Matrix-free methods|matrix-free fashion]], i.e., where neither <math>P</math>, nor <math>T=P^{-1}</math> (and often not even <math>A</math>) are explicitly available in a matrix form.
| |
| | |
| Preconditioners are useful in [[iterative methods]] to solve a linear system <math>Ax=b</math> for <math>x</math> since the [[rate of convergence]] for most iterative linear solvers increases as the [[condition number]] of a matrix decreases as a result of preconditioning. Preconditioned iterative solvers typically outperform direct solvers, e.g., [[Gaussian elimination]], for large, especially for [[sparse matrix|sparse]], matrices. Iterative solvers can be used as [[matrix-free methods]], i.e. become the only choice if the coefficient matrix <math>A</math> is not stored explicitly, but is accessed by evaluating matrix-vector products.
| |
| | |
| === Description ===
| |
| | |
| Instead of solving the original linear system above, one may solve either the right preconditioned system:
| |
| | |
| : <math> AP^{-1}Px = b</math>
| |
| | |
| via solving
| |
| | |
| : <math>AP^{-1}y=b</math>
| |
| | |
| for <math>y</math> and
| |
| | |
| : <math>Px=y</math>
| |
| | |
| for <math>x</math>; or the left preconditioned system:
| |
| | |
| : <math> P^{-1}(Ax-b)=0</math>
| |
| | |
| both of which give the same solution as the original system so long as the preconditioner matrix <math>P</math> is [[nonsingular]]. The left preconditioning is more common.
| |
| | |
| The goal of this preconditioned system is to reduce the [[condition number]] of the left or right preconditioned system matrix <math>P^{-1}A</math> or <math>AP^{-1},</math> respectively. The preconditioned matrix <math>P^{-1}A</math> or <math>AP^{-1}</math> is almost never explicitly formed. Only the action of applying the preconditioner solve operation <math>P^{-1}</math> to a given vector need be computed in iterative methods.
| |
| | |
| Typically there is a trade-off in the choice of <math>P</math>. Since the operator <math>P^{-1}</math> must be applied at each step of the iterative linear solver, it should have a small cost (computing time) of applying the <math>P^{-1}</math> operation. The cheapest preconditioner would therefore be <math>P=I</math> since then <math>P^{-1}=I.</math> Clearly, this results in the original linear system and the preconditioner does nothing. At the other extreme, the choice <math>P=A</math> gives <math>P^{-1}A = AP^{-1} = I,</math> which has optimal [[condition number]] of 1, requiring a single iteration for convergence; however in this case <math>P^{-1}=A^{-1},</math> and applying the preconditioner is as difficult as solving the original system. One therefore chooses <math>P</math> as somewhere between these two extremes, in an attempt to achieve a minimal number of linear iterations while keeping the operator <math>P^{-1}</math> as simple as possible. Some examples of typical preconditioning approaches are detailed below.
| |
| | |
| ===Preconditioned iterative methods===
| |
| Preconditioned iterative methods for <math>Ax-b=0</math> are, in most cases, mathematically equivalent to standard iterative methods applied to the preconditioned system <math>P^{-1}(Ax-b)=0.</math> For example, the standard [[Richardson iteration]] for solving <math>Ax-b=0</math> is
| |
| | |
| :<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n (A\mathbf{x}_n-\mathbf{b}),\ n \ge 0.</math>
| |
| | |
| Applied to the preconditioned system <math>P^{-1}(Ax-b)=0,</math> it turns into a preconditioned method
| |
| | |
| :<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n P^{-1}(A\mathbf{x}_n-\mathbf{b}),\ n \ge 0.</math>
| |
| | |
| Examples of popular preconditioned [[iterative methods]] for linear systems include the [[preconditioned conjugate gradient method]], the [[biconjugate gradient method]], and [[generalized minimal residual method]]. Iterative methods, which use scalar products to compute the iterative parameters, require corresponding changes in the scalar product together with substituting <math>P^{-1}(Ax-b)=0</math> for <math>Ax-b=0.</math>
| |
| | |
| ===Geometric interpretation===
| |
| For a [[Symmetric matrix|symmetric]] [[Positive-definite matrix|positive definite]] matrix <math>A</math> the preconditioner <math>P</math> is typically chosen to be symmetric positive definite as well. The preconditioned operator <math>P^{-1}A</math> is then also symmetric positive definite, but with respect to the <math>P</math>-based [[scalar product]]. In this case, the desired effect in applying a preconditioner is to make the [[quadratic form]] of the preconditioned operator <math>P^{-1}A</math> with respect to the <math>P</math>-based [[scalar product]] to be nearly spherical [http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf].
| |
| | |
| === Variable and non-linear preconditioning ===
| |
| Denoting <math>T=P^{-1}</math>, we highlight that preconditioning is practically implemented as multiplying some vector <math>r</math> by <math>T</math>, i.e., computing the product <math>Tr.</math> In many applications, <math>T</math> is not given as a matrix, but rather as an operator <math>T(r)</math> acting on the vector <math>r</math>. Some popular preconditioners, however, change with <math>r</math> and the dependence on <math>r</math> may not be linear. Typical examples involve using non-linear [[iterative methods]], e.g., the [[conjugate gradient method]], as a part of the preconditioner construction. Such preconditioners may be practically very efficient, however, their behavior is hard to predict theoretically.
| |
| | |
| ===Spectrally equivalent preconditioning===
| |
| The most common use of preconditioning is for iterative solution of linear systems resulting from approximations of [[partial differential equations]]. The better the approximation quality, the larger the matrix size is. In such a case, the goal of optimal preconditioning is, on the one side, to make the spectral condition number of <math> P^{-1}A</math> to be bounded from above by a constant independent in the matrix size, which is called ''spectrally equivalent'' preconditioning by [[Evgenii Georgievich D'yakonov|D'yakonov]]. On the other hand, the cost of application of the <math> P^{-1}</math> should ideally be proportional (also independent in the matrix size) to the cost of multiplication of <math>A</math> by a vector.
| |
| | |
| ===Examples===
| |
| | |
| ====Jacobi (or diagonal) preconditioner====
| |
| The '''Jacobi preconditioner''' is one of the simplest forms of preconditioning, in which the preconditioner is chosen to be the diagonal of the matrix <math> P = diag(A).</math> Assuming <math>A_{ii} \neq 0, \forall i </math>, we get <math>P^{-1}_{ij} = \frac{\delta_{ij}}{A_{ij}}.</math> It is efficient for [[Diagonally dominant matrix|diagonally dominant matrices]] <math> A</math>.
| |
| | |
| ====SPAI====
| |
| The '''Sparse Approximate Inverse''' preconditioner minimises <math>\|AT-I\|_F,</math> where <math>\|\cdot\|_F</math> is the Frobenius matrix norm and <math>T = P^{-1}</math> is from some suitably constrained set of sparse matrices. Under the Frobenius norm, this reduces to solving numerous independent least-squares problems (one for every column). The entries in <math>T</math> must be restricted to some sparsity pattern or the problem becomes as hard and time consuming as finding the exact inverse of <math>A</math>. This method, as well as means to select sparsity patterns, were introduced by [M.J. Grote, T. Huckle, SIAM J. Sci. Comput . 18 (1997) 838–853].
| |
| | |
| ==== Other preconditioners ====
| |
| * [[Incomplete Cholesky factorization]]
| |
| * [[Incomplete LU factorization]]
| |
| * [[Successive over-relaxation]]
| |
| ** [[Symmetric successive over-relaxation]]
| |
| * [[Multigrid#Multigrid_preconditioning]]
| |
| | |
| ===External links===
| |
| * [http://www.math-linux.com/spip.php?article55 Preconditioned Conjugate Gradient] – math-linux.com
| |
| * [http://www.netlib.org/linalg/html_templates/Templates.html Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods]
| |
| | |
| == Preconditioning for eigenvalue problems ==
| |
| Eigenvalue problems can be framed in several alternative ways, each leading to its own preconditioning. The traditional preconditioning is based on the so-called ''spectral transformations.'' Knowing (approximately) the targeted eigenvalue, one can compute the corresponding eigenvector by solving the related homogeneous linear system, thus allowing to use preconditioning for linear system. Finally, formulating the eigenvalue problem as optimization of the [[Rayleigh quotient]] brings preconditioned optimization techniques to the scene.
| |
| | |
| ===Spectral transformations===
| |
| By analogy with linear systems, for an [[eigenvalue]] problem <math> Ax = \lambda x</math> one may be tempted to replace the matrix <math>A</math> with the matrix <math>P^{-1}A</math> using a preconditioner <math>P</math>. However, this makes sense only if the seeking [[eigenvectors]] of <math>A</math> and <math>P^{-1}A</math> are the same. This is the case for spectral transformations.
| |
| | |
| The most popular spectral transformation is the so-called ''shift-and-invert'' transformation, where for a given scalar <math>\alpha</math>, called the ''shift'', the original eigenvalue problem <math> Ax = \lambda x</math> is replaced with the shift-and-invert problem <math> (A-\alpha I)^{-1}x = \mu x</math>. The eigenvectors are preserved, and one can solve the shift-and-invert problem by an iterative solver, e.g., the [[power iteration]]. This gives the [[Inverse iteration]], which normally converges to the eigenvector, corresponding to the eigenvalue closest to the shift <math>\alpha</math>. The [[Rayleigh quotient iteration]] is a shift-and-invert method with a variable shift.
| |
| | |
| Spectral transformations are specific for eigenvalue problems and have no analogs for linear systems. They require accurate numerical calculation of the transformation involved, which becomes the main bottleneck for large problems.
| |
| | |
| ===General preconditioning===
| |
| To make a close connection to linear systems, let us suppose that the targeted eigenvalue <math>\lambda_\star</math> is known (approximately). Then one can compute the corresponding eigenvector from the homogeneous linear system <math>(A-\lambda_\star I)x=0</math>. Using the concept of left preconditioning for linear systems, we obtain <math>T(A-\lambda_\star I)x=0</math>, where <math>T</math> is the preconditioner, which we can try to solve using the [[Richardson iteration]]
| |
| | |
| :<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n T(A-\lambda_\star I))\mathbf{x}_n,\ n \ge 0.</math>
| |
| | |
| ====The ''ideal'' preconditioning====
| |
| The [[Moore–Penrose pseudoinverse]] <math>T=(A-\lambda_\star I)^+</math> is the preconditioner, which makes the [[Richardson iteration]] above converge in one step with <math>\gamma_n=1</math>, since <math>I-(A-\lambda_\star I)^+(A-\lambda_\star I)</math>, denoted by <math>P_\star</math>, is the orthogonal projector on the eigenspace, corresponding to <math>\lambda_\star</math>. The choice <math>T=(A-\lambda_\star I)^+</math> is impractical for three independent reasons. First, <math>\lambda_\star</math> is actually not known, although it can be replaced with its approximation <math>\tilde\lambda_\star</math>. Second, the exact [[Moore–Penrose pseudoinverse]] requires the knowledge of the eigenvector, which we are trying to find. This can be somewhat circumvented by the use of the [[Jacobi–Davidson preconditioner]] <math>T=(I-\tilde P_\star)(A-\tilde\lambda_\star I)^{-1}(I-\tilde P_\star)</math>, where <math>\tilde P_\star</math> approximates <math>P_\star</math>. Last, but not least, this approach requires accurate numerical solution of linear system with the system matrix <math>(A-\tilde\lambda_\star I)</math>, which becomes as expensive for large problems as the shift-and-invert method above. If the solution is not accurate enough, step two may be redundant.
| |
| | |
| ====Practical preconditioning====
| |
| Let us first replace the theoretical value <math>\lambda_\star</math> in the [[Richardson iteration]] above with its current approximation <math>\lambda_n</math> to obtain a practical algorithm
| |
| | |
| :<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n T(A-\lambda_n I)\mathbf{x}_n,\ n \ge 0.</math>
| |
| | |
| A popular choice is <math>\lambda_n=\rho(x_n)</math> using the [[Rayleigh quotient]] function <math>\rho(\cdot)</math>. Practical preconditioning may be as trivial as just using <math>T=(diag(A))^{-1}</math> or <math>T=(diag(A-\lambda_n I))^{-1}.</math> For some classes of eigenvalue problems the efficiency of <math>T\approx A^{-1}</math> has been demonstrated, both numerically and theoretically. The choice <math>T\approx A^{-1}</math> allows one to easily utilize for eigenvalue problems the vast variety of preconditioners developed for linear systems.
| |
| | |
| Due to the changing value <math>\lambda_n</math>, a comprehensive theoretical convergence analysis is much more difficult, compared to the linear systems case, even for the simplest methods, such as the [[Richardson iteration]].
| |
| | |
| ===External links===
| |
| * [http://www.cs.ucdavis.edu/~bai/ET/contents.html Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide]
| |
| | |
| == Preconditioning in optimization ==
| |
| [[File:gradient descent.png|thumb|right|350px|Illustration of gradient descent]]
| |
| In [[optimization (mathematics)|optimization]], preconditioning is typically used to accelerate [[First-order approximation|first-order]] [[optimization (mathematics)|optimization]] [[algorithms]].
| |
| | |
| === Description ===
| |
| For example, to find a [[local minimum]] of a real-valued function <math>F(\mathbf{x})</math> using [[gradient descent]], one takes steps proportional to the ''negative'' of the [[gradient]] <math>-\nabla F(\mathbf{a})</math> (or of the approximate gradient) of the function at the current point:
| |
| | |
| :<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n \nabla F(\mathbf{x}_n),\ n \ge 0.</math>
| |
| | |
| The preconditioner is applied to the gradient:
| |
| | |
| :<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n P^{-1} \nabla F(\mathbf{x}_n),\ n \ge 0.</math>
| |
| | |
| Preconditioning here can be viewed as changing the geometry of the vector space with the goal to make the level sets look like circles. In this case the preconditioned gradient aims closer to the point of the extrema as on the figure, which speeds up the convergence.
| |
| | |
| ===Connection to linear systems===
| |
| The minimum of a quadratic function
| |
| | |
| :<math>F(\mathbf{x})= \frac{1}{2}\mathbf{x}^TA\mathbf{x}-\mathbf{x}^T\mathbf{b}</math>,
| |
| | |
| where <math>\mathbf{x}</math> and <math>\mathbf{b}</math> are real column-vectors and <math>A</math> is a real [[Symmetric matrix|symmetric]] [[positive-definite matrix]], is exactly the solution of the linear equation <math>A\mathbf{x}=\mathbf{b}</math>. Since <math>\nabla F(\mathbf{x})=A\mathbf{x}-\mathbf{b}</math>, the preconditioned [[gradient descent]] method of minimizing <math>F(\mathbf{x})</math> is
| |
| | |
| :<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n P^{-1}(A\mathbf{x}_n-\mathbf{b}),\ n \ge 0.</math>
| |
| | |
| This is the preconditioned [[Richardson iteration]] for solving a [[system of linear equations]].
| |
| | |
| ===Connection to eigenvalue problems===
| |
| | |
| The minimum of the [[Rayleigh quotient]]
| |
| | |
| :<math>\rho(\mathbf{x})= \frac{\mathbf{x}^TA\mathbf{x}}{\mathbf{x}^T\mathbf{x}},</math>
| |
| | |
| where <math>\mathbf{x}</math> is a real non-zero column-vector and <math>A</math> is a real [[Symmetric matrix|symmetric]] [[positive-definite matrix]], is the smallest [[eigenvalue]] of <math>A</math>, while the minimizer is the corresponding [[eigenvector]]. Since <math>\nabla \rho(\mathbf{x})</math> is proportional to <math>A\mathbf{x}-\rho(\mathbf{x})\mathbf{x}</math>, the preconditioned [[gradient descent]] method of minimizing <math>\rho(\mathbf{x})</math> is
| |
| | |
| :<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n P^{-1}(A\mathbf{x}_n-\rho(\mathbf{x_n})\mathbf{x_n}),\ n \ge 0.</math>
| |
| | |
| This is an analog of preconditioned [[Richardson iteration]] for solving eigenvalue problems.
| |
| | |
| === Variable preconditioning ===
| |
| In many cases, it may be beneficial to change the preconditioner at some or even every step of an [[iterative algorithm]] in order to accommodate for a changing shape of the level sets, as in
| |
| | |
| :<math>\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n P_n^{-1} \nabla F(\mathbf{x}_n),\ n \ge 0.</math>
| |
| | |
| One should have in mind, however, that constructing an efficient preconditioner is very often computationally expensive. The increased cost of updating the preconditioner can easily override the positive effect of faster convergence.
| |
| | |
| ==References==
| |
| *{{cite book
| |
| |title= Iterative Solution Methods
| |
| |last= Axelsson|first= Owe |year= 1996 |publisher= Cambridge University Press |isbn= 978-0-521-55569-2 |pages=6722}}
| |
| *{{cite book
| |
| |title= Optimization in solving elliptic problems
| |
| |last= D'yakonov |first= E. G. |year= 1996 |publisher= CRC-Press |isbn= 978-0-8493-2872-5 |pages=592}}
| |
| *{{Cite book
| |
| | first = H. A.|last= van der Vorst
| |
| | title = Iterative Krylov Methods for Large Linear systems
| |
| | publisher = Cambridge University Press, Cambridge
| |
| | year = 2003
| |
| | isbn = 0-521-81828-1
| |
| }}
| |
| | |
| {{Numerical linear algebra}}
| |
| | |
| [[Category:Numerical linear algebra]]
| |