|
|
Line 1: |
Line 1: |
| In [[probability theory]], '''heavy-tailed distributions''' are [[probability distribution]]s whose tails are not exponentially bounded:<ref name="Asmussen">{{cite doi|10.1007/0-387-21525-5_10}}</ref> that is, they have heavier tails than the [[exponential distribution]]. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.
| | Claude is her title and she completely digs that name. To perform croquet is the pastime I will by no means quit performing. For years she's been residing in Kansas. Meter reading is exactly where my main income arrives from but soon I'll be on my own.<br><br>Here is my site :: auto warranty - [http://Hs-S.com/dayz/home/index.php?mod=users&action=view&id=8983 click] - |
| | |
| There are three important subclasses of heavy-tailed distributions, the [[fat-tailed distribution]]s, the [[Long tail|long-tailed distributions]] and the '''subexponential distributions'''. In practice, all commonly used heavy-tailed distributions belong to the subexponential class.
| |
| | |
| There is still some discrepancy over the use of the term '''heavy-tailed'''. There are two other definitions in use. Some authors use the term to refer to those distributions which do not have all their power [[Moment (mathematics)|moments]] finite; and some others to those distributions that do not have a finite [[variance]]. The definition given in this article is the most general in use, and includes all distributions encompassed by the alternative definitions, as well as those distributions such as [[log-normal]] that possess all their power moments, yet which are generally acknowledged to be heavy-tailed. (Occasionally, heavy-tailed is used for any distribution that has heavier tails than the normal distribution.)
| |
| | |
| ==Definition of heavy-tailed distribution==
| |
| | |
| The distribution of a [[random variable]] ''X'' with [[cumulative distribution function|distribution function]] ''F'' is said to have a heavy right tail if<ref name="Asmussen"/>
| |
| | |
| :<math>
| |
| \lim_{x \to \infty} e^{\lambda x}\Pr[X>x] = \infty \quad \mbox{for all } \lambda>0.\,
| |
| </math>
| |
| | |
| This is also written in terms of the tail distribution function
| |
| | |
| : <math>\overline{F}(x) \equiv \Pr[X>x] \, </math>
| |
| | |
| as
| |
| | |
| :<math>
| |
| \lim_{x \to \infty} e^{\lambda x}\overline{F}(x) = \infty \quad \mbox{for all } \lambda>0.\,
| |
| </math>
| |
| | |
| This is equivalent to the statement that the [[moment generating function]] of ''F'', ''M<sub>F</sub>''(''t''), is infinite for all ''t'' > 0.<ref>Rolski, Schmidli, Scmidt, Teugels, ''Stochastic Processes for Insurance and Finance'', 1999</ref>
| |
| | |
| The definitions of heavy-tailed for left-tailed or two tailed distributions are similar.
| |
| | |
| ==Definition of long-tailed distribution==
| |
| | |
| The distribution of a [[random variable]] ''X'' with [[cumulative distribution function|distribution function]] ''F'' is said to have a long right tail<ref name="Asmussen"/> if for all ''t'' > 0,
| |
| | |
| :<math>
| |
| \lim_{x \to \infty} \Pr[X>x+t|X>x] =1, \,
| |
| </math>
| |
| | |
| or equivalently
| |
| | |
| :<math>
| |
| \overline{F}(x+t) \sim \overline{F}(x) \quad \mbox{as } x \to \infty. \,
| |
| </math>
| |
| | |
| This has the intuitive interpretation for a right-tailed long-tailed distributed quantity that if the long-tailed quantity exceeds some high level, the probability approaches 1 that it will exceed any other higher level: if you know the situation is good, it is probably better than you think.
| |
| | |
| All long-tailed distributions are heavy-tailed, but the converse is false, and it is possible to construct heavy-tailed distributions that are not long-tailed.
| |
| | |
| ==Subexponential distributions==
| |
| | |
| Subexponentiality is defined in terms of [[Convolution#Definition|convolution]]s of [[probability distributions]]. For two independent, identically distributed [[random variables]] <math> X_1,X_2</math> with common distribution function <math>F</math> the convolution of <math>F</math> with itself, <math>F^{*2}</math> is defined, using [[Lebesgue–Stieltjes integration]], by:
| |
| | |
| :<math>
| |
| \Pr[X_1+X_2 \leq x] = F^{*2}(x) = \int_{- \infty}^\infty F(x-y)\,dF(y).
| |
| </math>
| |
| | |
| The ''n''-fold convolution <math>F^{*n}</math> is defined in the same way. The tail distribution function <math>\overline{F}</math> is defined as <math>\overline{F}(x) = 1-F(x)</math>.
| |
| | |
| A distribution <math>F</math> on the positive half-line is subexponential<ref name="Asmussen"/> if
| |
| | |
| :<math>
| |
| \overline{F^{*2}}(x) \sim 2\overline{F}(x) \quad \mbox{as } x \to \infty.
| |
| </math>
| |
| | |
| This implies<ref name="Embrechts">{{cite doi|10.1007/978-3-642-33483-2}}</ref> that, for any <math>n \geq 1</math>,
| |
| | |
| :<math>
| |
| \overline{F^{*n}}(x) \sim n\overline{F}(x) \quad \mbox{as } x \to \infty.
| |
| </math>
| |
| | |
| The probabilistic interpretation<ref name="Embrechts"/> of this is that, for a sum of <math>n</math> [[statistical independence|independent]] [[random variables]] <math>X_1,\ldots,X_n</math> with common distribution <math>F</math>,
| |
| | |
| :<math>
| |
| \Pr[X_1+ \cdots +X_n>x] \sim \Pr[\max(X_1, \ldots,X_n)>x] \quad \text{as } x \to \infty.
| |
| </math>
| |
| | |
| This is often known as the principle of the single big jump<ref>{{cite doi|10.1007/s10959-007-0081-2}}</ref> or catastrophe principle.<ref>{{cite web| url = http://rigorandrelevance.wordpress.com/2014/01/09/catastrophes-conspiracies-and-subexponential-distributions-part-iii/ | title = Catastrophes, Conspiracies, and Subexponential Distributions (Part III) | first = Adam | last = Wierman | authorlink = Adam Wierman | date = January 09 2014 | accessdate = January 09 2014 | website = Rigor + Relevance blog | publisher = RSRG, Caltech}}</ref>
| |
| | |
| A distribution <math>F</math> on the whole real line is subexponential if the distribution
| |
| <math>F I([0,\infty))</math> is.<ref>{{cite journal | last = Willekens | first = E. | title = Subexponentiality on the real line | journal = Technical Report | publisher = K.U. Leuven | year = 1986}}</ref> Here <math>I([0,\infty))</math> is the [[indicator function]]
| |
| of the positive half-line. Alternatively, a random variable <math>X</math> supported on the real line is subexponential if and only if <math>X^+ = \max(0,X)</math> is subexponential.
| |
| | |
| All subexponential distributions are long-tailed, but examples can be constructed of long-tailed distributions that are not subexponential.
| |
| | |
| ==Common heavy-tailed distributions==
| |
| | |
| All commonly used heavy-tailed distributions are subexponential.<ref name="Embrechts"/>
| |
| | |
| Those that are one-tailed include:
| |
| *the [[Pareto distribution]];
| |
| *the [[Log-normal distribution]];
| |
| *the [[Lévy distribution]];
| |
| *the [[Weibull distribution]] with shape parameter less than 1;
| |
| *the [[Burr distribution]];
| |
| *the [[log-gamma distribution]];
| |
| *the [[log-Cauchy distribution]], sometimes described as having a "super-heavy tail" because it exhibits [[logarithmic growth|logarithmic decay]] producing a heavier tail than the Pareto distribution.<ref>{{cite book|title=Laws of Small Numbers: Extremes and Rare Events|author=Falk, M., Hüsler, J. & Reiss, R.|page=80|year=2010|publisher=Springer|isbn=978-3-0348-0008-2}}</ref><ref>{{cite web|title=Statistical inference for heavy and super-heavy tailed distributions|url=http://docentes.deio.fc.ul.pt/fragaalves/SuperHeavy.pdf|author=Alves, M.I.F., de Haan, L. & Neves, C.|date=March 10, 2006}}</ref>
| |
| | |
| Those that are two-tailed include:
| |
| *The [[Cauchy distribution]], itself a special case of both the stable distribution and the t-distribution;
| |
| *The family of [[stable distributions]],<ref>{{cite web |author=John P. Nolan | title=Stable Distributions: Models for Heavy Tailed Data| year=2009 | url=http://academic2.american.edu/~jpnolan/stable/chap1.pdf | format=PDF | accessdate=2009-02-21}}</ref> excepting the special case of the normal distribution within that family. Some stable distributions are one-sided (or supported by a half-line), see e.g. [[Lévy distribution]]. See also ''[[financial models with long-tailed distributions and volatility clustering]]''.
| |
| *The [[t-distribution]].
| |
| *The skew lognormal cascade distribution.<ref>{{cite web |author=Stephen Lihn | title=Skew Lognormal Cascade Distribution| year=2009 | url=http://www.skew-lognormal-cascade-distribution.org/ }}</ref>
| |
| | |
| == Relationship to fat-tailed distributions ==
| |
| A [[fat-tailed distribution]] is a distribution for which the probability density function, for large x, goes to zero as a power <math>x^{-a}</math>. Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed. Some distributions however have a tail which goes to zero slower than an exponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is the [[log-normal distribution]]. Many other heavy-tailed distributions such as the [[log-logistic distribution|log-logistic]] and [[Pareto distribution|Pareto]] distribution are however also fat-tailed.
| |
| | |
| == Estimating the tail-index ==
| |
| | |
| To estimate the tail-index, we could estimate the GEV distribution or Pareto distribution parameters on data using the maximum-likelihood estimation (MLE).
| |
| | |
| === Pickands tail-index === | |
| With <math>(X_n , n \geq 1)</math> a random sequence of independent and same density function <math>F \in D(H(\xi))</math>, the Maximum Attraction Domain<ref name=Pickands>{{cite journal|last=Pickands III|first=James|title=Statistical Inference Using Extreme Order Statistics|journal=The Annals of Statistics|year=1975|month=Jan|volume=3|issue=1|pages=119-131|url=http://www.jstor.org/stable/2958083}}</ref> of the generalized extreme value density <math> H </math>, where <math>\xi \in \mathbb{R}</math>. If <math>\lim_{n\to\infty} k(n) = \infty </math> and <math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>, then the ''Pickands'' tail-index estimation is :<ref name="Embrechts">{{cite book |author=Embrechts Paul, C. Klueppelberg, T. Mikosch |title=Modelling extremal events for insurance and finance |publisher=Springer |location=Berlin |year=1997 | sous-titre= Applications of Mathematics | volume=33}}</ref><ref name="Pickands"/>
| |
| :<math>
| |
| \xi^{Pickands}_{(k(n),n)} =\frac{1}{\ln 2} \ln \left( \frac{X_{(n-k(n)+1,n)} - X_{(n-2k(n)+1,n)}}{X_{(n-2k(n)+1,n)} - X_{(n-4k(n)+1,n)}}\right)
| |
| </math>
| |
| where <math>X_{(n-k(n)+1,n)}=\max \left(X_{n-k(n)+1},\ldots ,X_{n}\right)</math>. This estimator converge in probability to <math>\xi</math>.
| |
| | |
| === Hill tail-index ===
| |
| | |
| With <math>(X_n , n \geq 1)</math> a random sequence of independent and same density function <math>F \in D(H(\xi))</math>, the Maximum Attraction Domain of the generalized extreme value density <math> H </math>, where <math>\xi \in \mathbb{R}</math>. If <math>\lim_{n\to\infty} k(n) = \infty </math> and <math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>, then the ''Hill'' tail-index estimation is :<ref name="Embrechts" />
| |
| :<math>
| |
| \xi^{Hill}_{(k(n),n)} = \frac{1}{k(n)} \sum_{i=n-k(n)+1}^{n} \ln(X_{(i,n)}) - \ln (X_{(n-k(n)+1,n)})
| |
| </math>
| |
| where <math>X_{(n-k(n)+1,n)}=\max \left(X_{n-k(n)+1},\ldots ,X_{n}\right)</math>.
| |
| This estimator converge in probability to <math>\xi</math>.
| |
| | |
| ==Software==
| |
| | |
| * [http://www.cs.bu.edu/~crovella/aest.html aest], [[C (programming language)|C]] tool for estimating the heavy tail index<ref>{{cite doi|10.1023/A:1010012224103}}</ref>
| |
| | |
| ==See also==
| |
| *[[Fat tail]]
| |
| *[[Leptokurtic]]
| |
| *[[Outlier]]
| |
| *[[The Long Tail]]
| |
| *[[Power law]]
| |
| | |
| ==References==
| |
| | |
| <references/>
| |
| | |
| [[Category:Tails of probability distributions]]
| |
| [[Category:Types of probability distributions]]
| |
| [[Category:Actuarial science]]
| |
| [[Category:Risk]]
| |