|
|
Line 1: |
Line 1: |
| In [[probability theory]], the theory of '''large deviations''' concerns the asymptotic behaviour of remote tails of sequences of probability distributions. Some basic ideas of the theory can be traced back to [[Pierre-Simon Laplace|Laplace]] and [[Harald Cramér|Cramér]], but a clear and unified formal definition was only introduced in 1966, in a paper by [[S. R. Srinivasa Varadhan|Varadhan]].<ref>S.R.S. Varadhan, ''Asymptotic probability and differential equations'', [[Communications on Pure and Applied Mathematics|Comm. Pure Appl. Math.]] 19 (1966),261-286.</ref> Large deviations theory formalizes the heuristic ideas of ''concentration of measures'' and widely generalizes the notion of [[Convergence_of_measures#Weak_convergence_of_random_variables|convergence of probability measures]].
| | Andrew Berryhill is what his spouse loves to call him and he completely digs that title. To climb is something I really enjoy doing. For many years he's been living in Alaska and he doesn't strategy on altering it. He functions as a bookkeeper.<br><br>my blog love psychic ([http://galab-Work.Cs.pusan.ac.kr/Sol09B/?document_srl=1489804 pusan.ac.kr]) |
| | |
| Roughly speaking, large deviations theory concerns itself with the exponential decline of the probability measures of certain kinds of extreme or ''tail'' events, as the number of observations grows arbitrarily large.
| |
| | |
| ==Introductory examples==
| |
| | |
| === An elementary example ===
| |
| Consider a sequence of independent tosses of a fair
| |
| coin. The possible outcomes could be heads or tails. Let us denote the possible outcome of the i-th trial by
| |
| <math> X_i </math>, where we encode head as 1 and tail as 0. Now let <math> M_N </math> denote the mean value after <math> N </math> trials, namely
| |
| :<math> M_N = \frac{1}{N}\sum_{i=1}^{N} X_i.</math>
| |
| Then <math> M_N </math> lies between 0 and 1. From the [[law of large numbers]] (and also from our experience) we know that as N grows, the distribution of <math> M_N </math> converges to <math> 0.5 = \operatorname{E}[X_1] </math> (the expectation value of a single coin toss) almost surely.
| |
| | |
| Moreover, by the [[central limit theorem]], we know that <math> M_N </math> is approximately normally distributed for large <math> N </math>. The central limit theorem can provide more detailed information about the behavior of <math> M_N </math> than the law of large numbers. For example, we can approximately find a tail probability of <math> M_N </math>, <math> P(M_N > x) </math>, that
| |
| <math> M_N </math> is greater than <math> x </math>, for a fixed value of <math> N </math>. However, the approximation by the CLT may not be accurate if <math>x </math> is far from <math> \operatorname{E}[X_1] </math>. Also, it does not provide information about the convergence of the tail probabilities as <math> N \to \infty </math>. However, the large deviation theory can provide answers for such problems.
| |
| | |
| Let us make this statement more precise. For a given value <math> 0.5<x<1 </math>, let us compute the tail probability <math> P(M_N > x) </math>. Define
| |
| :<math> I(x) = x \, \text{ln} x + (1-x) \, \text{ln} (1-x) + \text{ln}2. </math>
| |
| (Note that the function <math> I(x) </math> is a convex function increasing on <math>[0.5,1)</math>. It resembles the [[Bernoulli entropy]]; that it's appropriate for coin tosses follows from the [[asymptotic equipartition property]] applied to a [[Bernoulli trial]].) Then by [[Chernoff's inequality]], it can be shown that <math> P(M_N > x) < \exp(-NI(x)) </math>.{{Citation needed|date=June 2011}} This bound is rather sharp, in the sense that <math> I(x) </math> cannot be replaced with a larger number which would yield a strict inequality for all positive <math> N </math>. (However, the exponential bound can still be reduced by a subexponential factor on the order of <math> 1/\sqrt N </math>; this follows from the [[Stirling approximation]] applied to the [[binomial coefficient]] appearing in the [[Bernoulli distribution]].) Hence, we obtain the following result:
| |
| :<math> P(M_N > x) \approx \exp(-NI(x)) .</math>
| |
| The probability <math> P(M_N > x) </math> decays exponentially as <math> N </math> grows to infinity, at a rate depending on x. This formula approximates any tail probability of the sample mean of i.i.d. variables and gives its convergence as the number of samples increases.
| |
| | |
| === Large deviations for sums of independent random variables ===
| |
| In the above example of coin-tossing we explicitly assumed that each toss is an
| |
| independent trial, and the probability of getting head or tail is always the same.
| |
| | |
| Let <math> X,X_1,X_2,... </math> be [[i.i.d.|independent and identically distributed]] (i.i.d.) random variables (r.v.s) whose common distribution satisfies a certain growth condition. Then the following limit exists:
| |
| | |
| :<math>\lim_{N\to \infty} \frac{1}{N} \ln P(M_N > x) = - I(x) .</math>
| |
| | |
| Function <math> I(\cdot) </math> is called the "[[rate function]]" or "Cramér function" or sometimes the "entropy function".
| |
| | |
| The above mentioned limit means that for large <math>N</math>,
| |
| | |
| :<math> P(M_N >x) \approx \exp[-NI(x) ],</math>
| |
| | |
| which is the basic result of large deviations theory.<ref>http://math.nyu.edu/faculty/varadhan/Spring2012/Chapters1-2.pdf</ref> <ref>S.R.S. Varadhan, Large Deviations and Applications (SIAM, Philadelphia, 1984)</ref>
| |
| | |
| If we know the probability distribution of <math> X </math>, an explicit expression for the rate function can be obtained. This is given by a [[Legendre–Fenchel transformation]],<ref>{{cite journal|last=Touchette|first=Hugo|title=The large deviation approach to statistical mechanics|journal=Physics Reports|date=1 July 2009|volume=478|issue=1-3|pages=1–69|doi=10.1016/j.physrep.2009.05.002}}</ref>
| |
| | |
| :<math>I(x) = \sup_{\theta > 0} [\theta x - \lambda(\theta)],</math>
| |
| | |
| where
| |
| | |
| :<math> \lambda(\theta) = \ln \operatorname{E}[\exp(\theta X)] </math>
| |
| | |
| is called the [[cumulant generating function]] (CGF) and <math> \operatorname{E} </math> denotes the [[mathematical expectation]].
| |
| | |
| If <math> X </math> follows a [[normal distribution]], the rate function becomes a parabola with its apex at the mean of the normal distribution.
| |
| | |
| If <math>\{X_i\}</math> is a [[Markov chain]], the variant of the basic large deviations result stated above may be hold.{{Citation needed|date=June 2011}}
| |
| | |
| ==Formal definition==
| |
| Given a [[Polish space]] <math>\mathcal{X}</math> let <math>\{ \mathbb{P}_N\}</math> be a sequence of [[Borel algebra|Borel]] probability measures on <math>\mathcal{X}</math>, let <math>\{a_N\}</math> be a sequence of positive real numbers such that <math>\lim_N a_N=+\infty</math>, and finally let <math>I:\mathcal{X}\to [0,+\infty]</math> be a [[lower semicontinuous]] functional on <math>\mathcal{X}</math>. The sequence <math>\{ \mathbb{P}_N\}</math> is said to satisfy a '''[[large deviation principle]]''' with ''speed'' <math>\{a_n\}</math> and ''rate'' <math>I</math> if, and only if, for each Borel [[measurable set]] <math>E \subset \mathcal{X}</math>,
| |
| | |
| :<math> -\inf_{x \in E^\circ} I(x) \le \varliminf_N a_N^{-1} \log\big(\mathbb{P}_N(E)\big) \le \varlimsup_N a_N^{-1} \log\big(\mathbb{P}_N(E)\big) \le -\inf_{x \in \bar{E}} I(x) ,</math>
| |
| | |
| where <math>\bar{E}</math> and <math>E^\circ</math> denote respectively the [[closure (topology)|closure]] and [[interior (topology)|interior]] of <math>E</math>.{{Citation needed|date=June 2011}}
| |
| | |
| == Brief history ==
| |
| The first rigorous results concerning large deviations are due to the Swedish mathematician
| |
| [[Harald Cramér]], who applied them to model the insurance business.{{Citation needed|date=June 2011}} From the point
| |
| of view of an insurance company, the earning is at a constant rate per month
| |
| (the monthly premium) but the claims come randomly. For the company to be successful
| |
| over a certain period of time (preferably many months), the total earning should
| |
| exceed the total claim. Thus to estimate the premium you have to ask the following
| |
| question : "What should we choose as the premium <math> q </math> such that over
| |
| <math> N </math> months the total claim <math> C = \Sigma X_i </math> should
| |
| be less than <math> Nq </math> ? " This is clearly the same question asked by
| |
| the large deviations theory. Cramér gave a solution to this question for i.i.d. [[random variable]]s, where the rate function is expressed as a [[power series]].
| |
| | |
| A very incomplete list of mathematicians who have made important advances would
| |
| include [[Aleksei Zinovyevich Petrov|Petrov]],<ref name="Petrov">Petrov V.V. (1954) Generalization of Cramér's limit theorem. Uspehi Matem. Nauk, v. 9, No 4(62), 195--202.(Russian)</ref> [[Ivan Nikolaevich Sanov|Sanov]],<ref name="Sanov">Sanov I.N. (1957) On the probability of large deviations of random magnitudes. Matem. Sbornik, v. 42 (84), 11--44.</ref>
| |
| [[S.R.S. Varadhan]] (who has won the Abel prize), [[D. Ruelle]] and [[Oscar Lanford|O.E. Lanford]].
| |
| | |
| ==Applications==
| |
| Principles of large deviations may be effectively applied to gather information out of a probabilistic model. Thus, theory of large deviations finds its applications in [[information theory]] and [[risk management]]. In Physics, the best known application of large deviations theory arise in [[Thermodynamics]] and [[Statistical Mechanics]] (in connection with relating [[entropy]] with rate function).
| |
| | |
| === Large deviations and entropy ===
| |
| {{main|asymptotic equipartition property}}
| |
| The rate function is related to the [[entropy]] in statistical mechanics. This can be heuristically seen in the following way. In statistical mechanics the entropy of a particular macro-state is related to the number of micro-states which corresponds to this macro-state. In our coin tossing example the mean value <math> M_N </math> could designate a particular macro-state. And the particular sequence of heads and tails which gives rise to a particular value of <math> M_N </math> constitutes a particular micro-state. Loosely speaking a macro-state having a higher number of micro-states giving rise to it, has higher entropy. And a state with higher entropy has a higher chance of being realised in actual experiments. The macro-state with mean value of 1/2 (as many heads as tails) has the highest number micro-states giving rise to it and it is indeed the state with the highest entropy. And in most practical situations
| |
| we shall indeed obtain this macro-state for large numbers of trials. The "rate function" on the other hand measures the probability of appearance of a particular macro-state. The smaller the rate function the higher is the chance of a macro-state appearing. In our coin-tossing the value of the "rate function" for mean value equal to 1/2 is zero. In this way one can see the "rate function" as the negative of the "entropy".
| |
| | |
| There is a relation between the "rate function" in large deviations theory and the [[Kullback–Leibler divergence]] (see Sanov <ref name="Sanov"/> and
| |
| Novak,<ref name="Novak">Novak S.Y. (2011) Extreme value methods with applications to finance. Chapman & Hall/CRC Press. ISBN 978-1-4398-3574-6.</ref> ch. 14.5).
| |
| | |
| In a special case, large deviations are closely related to the concept of [[Gromov–Hausdorff convergence|Gromov–Hausdorff limits]].<ref> Kotani M., Sunada T. ''Large deviation and the tangent cone at infinity of a crystal lattice'', Math. Z. 254, (2006), 837-870. </ref>
| |
| | |
| == See also ==
| |
| * [[Cramér's large deviation theorem]]
| |
| * [[Chernoff's inequality]]
| |
| * [[Contraction principle (large deviations theory)]], a result on how large deviations principles "[[Pushforward measure|push forward]]"
| |
| * [[Freidlin–Wentzell theorem]], a large deviations principle for [[Itō diffusion]]s
| |
| * [[Laplace principle (large deviations theory)|Laplace principle]], a large deviations principle in '''R'''<sup>''d''</sup>
| |
| * [[Schilder's theorem]], a large deviations principle for [[Brownian motion]]
| |
| * [[Varadhan's lemma]]
| |
| * [[Extreme value theory]]
| |
| * [[Large deviations of Gaussian random functions]]
| |
| | |
| == References ==
| |
| {{Reflist}}
| |
| | |
| ==Bibliography==
| |
| * [http://arxiv.org/abs/0804.2330v1 Special invited paper: Large deviations] by S. R. S. Varadhan The Annals of Probability 2008, Vol. 36, No. 2, 397–419 {{doi|10.1214/07-AOP348}}
| |
| * Entropy, Large Deviations and Statistical Mechanics by R.S. Ellis, Springer Publication. ISBN 3-540-29059-1
| |
| * Large Deviations for Performance Analysis by Alan Weiss and Adam Shwartz. Chapman and Hall ISBN 0-412-06311-5
| |
| * Large Deviations Techniques and Applications by Amir Dembo and Ofer Zeitouni. Springer ISBN 0-387-98406-2
| |
| * Random Perturbations of Dynamical Systems by M.I. Freidlin and A.D. Wentzell. Springer ISBN 0-387-98362-7
| |
| | |
| ==External links==
| |
| *[http://www.cl.cam.ac.uk/Research/SRG/netos/old-projects/measure/tutorial/rev-tutorial.ps.gz An elementary introduction to the Large Deviations Theory]
| |
| {{Use dmy dates|date=June 2011}}
| |
| | |
| {{DEFAULTSORT:Large Deviations Theory}}
| |
| [[Category:Asymptotic analysis]]
| |
| [[Category:Large deviations theory| ]]
| |
| [[Category:Asymptotic statistical theory]]
| |