|
|
Line 1: |
Line 1: |
| In [[probability theory]], a '''conditional expectation''' (also known as '''conditional expected value''' or '''conditional mean''') is the [[expected value]] of a real [[random variable]] with respect to a [[conditional probability distribution]].
| | Andrew Simcox is the name his parents gave him and he totally loves this title. Since I was eighteen I've been working as a bookkeeper but quickly my spouse and I will begin our own company. I've always cherished living in Alaska. Playing badminton is a factor that he is completely addicted to.<br><br>my web blog: authentic psychic readings ([http://www.realwomenrun.org/if-you-want-recommendations-on-hobbies-theyre-here/ click here]) |
| | |
| The concept of conditional expectation is extremely important in [[Andrey Kolmogorov|Kolmogorov]]'s [[measure theory|measure-theoretic]] definition of [[probability theory]]. In fact, the concept of conditional probability itself is actually defined in terms of conditional expectation.
| |
| | |
| == Introduction ==
| |
| Let ''X'' and ''Y'' be [[discrete random variable]]s, then the '''conditional expectation''' of ''X'' given the event ''Y=y'' is a function of ''y'' over the range of ''Y''
| |
| | |
| :<math> \operatorname{E} (X | Y=y ) = \sum_{x \in \mathcal{X}} x \ \operatorname{P}(X=x|Y=y) = \sum_{x \in \mathcal{X}} x \ \frac{\operatorname{P}(X=x,Y=y)}{\operatorname{P}(Y=y)}, </math>
| |
| | |
| where <math>\mathcal{X}</math> is the [[Range (mathematics)|range]] of ''X''.
| |
| | |
| If now ''X'' is a [[continuous random variable]], while ''Y'' remains a discrete variable, the '''conditional expectation''' is:
| |
| :<math> \operatorname{E} (X | Y=y )= \int_{\mathcal{X}} x f_X (x |Y=y) dx </math>
| |
| | |
| where <math> f_X (\,\cdot\, |Y=y)</math> is the [[conditional density]] of <math>X</math> given <math>Y=y</math>.
| |
| | |
| A problem arises when ''Y'' is [[continuous random variable|continuous]]. In this case, the probability P(''Y=y'') = 0, and the [[Borel–Kolmogorov paradox]] demonstrates the ambiguity of attempting to define conditional probability along these lines.
| |
| | |
| However the above expression may be rearranged:
| |
| | |
| :<math> \operatorname{E} (X | Y=y) \operatorname{P}(Y=y) = \sum_{x \in \mathcal{X}} x \ \operatorname{P}(X=x,Y=y), </math>
| |
| | |
| and although this is trivial for individual values of ''y'' (since both sides are zero), it should hold for any measurable subset ''B'' of the domain of ''Y'' that:
| |
| | |
| :<math> \int_B \operatorname{E} (X | Y=y) \operatorname{P}(Y=y) \ \operatorname{d}y = \int_B \sum_{x \in \mathcal{X}} x \ \operatorname{P}(X=x,Y=y) \ \operatorname{d}y. </math>
| |
| | |
| In fact, this is a sufficient condition to define both conditional expectation and conditional probability.
| |
| | |
| == Formal definition ==
| |
| Let <math>\scriptstyle (\Omega, \mathcal {F}, \operatorname {P} )</math> be a [[probability space]], with a [[random variable#Formal definition|random variable]] <math>\scriptstyle X:\Omega \to \mathbb{R}^n</math> and a sub-[[sigma-algebra|σ-algebra]] <math>\scriptstyle \mathcal {H} \subseteq \mathcal {F} </math>.
| |
| | |
| Then a '''conditional expectation''' of ''X'' given <math>\scriptstyle \mathcal {H} </math> (denoted as <math>\scriptstyle \operatorname{E}\left[X|\mathcal {H} \right]</math>) is any <math>\scriptstyle \mathcal {H} </math>-[[measurable function]] (<math>\Omega \to \mathbb{R}^n</math>) which satisfies:
| |
| :<math> \int_H \operatorname{E}\left[X|\mathcal {H} \right] (\omega) \ \operatorname{d} \operatorname{P}(\omega) = \int_H X(\omega) \ \operatorname{d} \operatorname{P}(\omega) \qquad \text{for each} \quad H \in \mathcal {H} </math>.<ref>[[#loe1978|Loève (1978)]], p. 7</ref>
| |
| | |
| Note that <math>\scriptstyle \operatorname{E}\left[X|\mathcal {H} \right]</math> is simply the name of the conditional expectation function.
| |
| | |
| === Discussion ===
| |
| A couple of points worth noting about the definition:
| |
| * This is not a constructive definition; we are merely given the required property that a conditional expectation must satisfy.
| |
| ** The required property has the same form as the last expression in the Introduction section.
| |
| ** Existence of a conditional expectation function is determined by the [[Radon–Nikodym theorem]], a sufficient condition is that the (unconditional) expected value for ''X'' exist.
| |
| ** Uniqueness can be shown to be [[almost surely|almost sure]]: that is, versions of the same conditional expectation will only differ on a [[null set|set of probability zero]].
| |
| * The σ-algebra <math>\scriptstyle \mathcal {H} </math> controls the "granularity" of the conditioning. A conditional expectation <math>\scriptstyle{E}\left[X|\mathcal {H} \right]</math> over a finer-grained σ-algebra <math>\scriptstyle \mathcal {H} </math> will allow us to condition on a wider variety of events.
| |
| ** To condition freely on values of a random variable ''Y'' with state space <math>\scriptstyle (\mathcal Y, \Sigma) </math>, it suffices to define the conditional expectation using the [[pre-image]] of ''Σ'' with respect to ''Y'', so that <math>\scriptstyle \operatorname{E}\left[X| Y\right]</math> is defined to be <math>\scriptstyle \operatorname{E}\left[X|\mathcal {H} \right]</math>, where
| |
| :::<math> \mathcal {H} = \sigma(Y):= Y^{-1}\left(\Sigma\right):= \{Y^{-1}(S) : S \in \Sigma \} </math>
| |
| :: This suffices to ensure that the conditional expectation is σ(''Y'')-measurable. Although conditional expectation is defined to condition on events in the underlying probability space Ω, the requirement that it be σ(''Y'')-measurable allows us to condition on ''Y'' as in the introduction.
| |
| | |
| == Definition of conditional probability ==
| |
| For any event <math>A \in \mathcal{A} \supseteq \mathcal B</math>, define the [[indicator function]]:
| |
| | |
| :<math>\mathbf{1}_A (\omega) = \begin{cases} 1 \; &\text{if } \omega \in A, \\ 0 \; &\text{if } \omega \notin A, \end{cases}</math>
| |
| | |
| which is a random variable with respect to the [[Borel algebra|Borel σ-algebra]] on (0,1). Note that the expectation of this random variable is equal to the probability of ''A'' itself:
| |
| | |
| :<math>\operatorname{E}(\mathbf{1}_A) = \operatorname{P}(A). \; </math>
| |
| | |
| Then the '''[[conditional probability]] given <math>\scriptstyle \mathcal B</math>''' is a function <math>\scriptstyle \operatorname{P}(\cdot|\mathcal{B}):\mathcal{A} \times \Omega \to (0,1)</math> such that <math>\scriptstyle \operatorname{P}(A|\mathcal{B})</math> is the conditional expectation of the indicator function for ''A'':
| |
| | |
| :<math>\operatorname{P}(A|\mathcal{B}) = \operatorname{E}(\mathbf{1}_A|\mathcal{B}) \; </math>
| |
| | |
| In other words, <math>\scriptstyle \operatorname{P}(A|\mathcal{B}) </math> is a <math>\scriptstyle \mathcal B</math>-measurable function satisfying
| |
| | |
| :<math>\int_B \operatorname{P}(A|\mathcal{B}) (\omega) \, \operatorname{d} \operatorname{P}(\omega) = \operatorname{P} (A \cap B) \qquad \text{for all} \quad A \in \mathcal{A}, B \in \mathcal{B}. </math>
| |
| | |
| A conditional probability is [[Regular_conditional_probability|'''regular''']] if <math>\scriptstyle \operatorname{P}(\cdot|\mathcal{B})(\omega) </math> is also a [[probability measure]] for all ''ω'' ∈ ''Ω''. An expectation of a random variable with respect to a regular conditional probability is equal to its conditional expectation.
| |
| | |
| * For the trivial sigma algebra <math>\mathcal B= \{\emptyset,\Omega\}</math> the conditional probability is a constant function, <math>\operatorname{P}\!\left( A| \{\emptyset,\Omega\} \right) \equiv\operatorname{P}(A).</math>
| |
| | |
| * For <math>A\in \mathcal{B}</math>, as outlined above, <math>\operatorname{P}(A|\mathcal{B})=1_A.</math>.
| |
| | |
| See also [[Conditional_probability_distribution#Measure-Theoretic_Formulation|conditional probability distribution]].
| |
| | |
| == Conditioning as factorization ==
| |
| | |
| In the definition of conditional expectation that we provided above, the fact that ''Y'' is a ''real'' random variable is irrelevant: Let ''U'' be a measurable space, that is, a set equipped with a σ-algebra <math>\Sigma</math> of subsets. A ''U''-valued random variable is a function <math>Y\colon (\Omega,\mathcal A) \mapsto (U,\Sigma)</math> such that <math>Y^{-1}(B)\in \mathcal A</math> for any measurable subset <math>B\in \Sigma</math> of ''U''.
| |
| | |
| We consider the measure Q on ''U'' given as above: Q(''B'') = P(''Y''<sup>−1</sup>(''B'')) for every measurable subset ''B'' of ''U''. Then Q is a probability measure on the measurable space ''U'' defined on its σ-algebra of measurable sets.
| |
| | |
| '''Theorem'''. If ''X'' is an integrable random variable on Ω then there is one and, up to equivalence a.e. relative to Q, only one integrable function ''g'' on ''U'' (which is written <math>g= \operatorname{E}(X \mid Y)</math>) such that for any measurable subset ''B'' of ''U'':
| |
| | |
| :<math> \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} g(u) \ d \operatorname{Q} (u). </math>
| |
| | |
| There are a number of ways of proving this; one as suggested above, is to note that the expression on the left hand side defines, as a function of the set ''B'', a countably additive signed measure ''μ'' on the measurable subsets of ''U''. Moreover, this measure ''μ'' is absolutely continuous relative to Q. Indeed Q(''B'') = 0 means exactly that ''Y''<sup>−1</sup>(''B'') has probability 0. The integral of an integrable function on a set of probability 0 is itself 0. This proves absolute continuity. Then the [[Radon–Nikodym theorem]] provides the function ''g'', equal to the density of ''μ'' with respect to Q.
| |
| | |
| The defining condition of conditional expectation then is the equation
| |
| | |
| :<math> \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} \operatorname{E}(X \mid Y)(u) \ d \operatorname{Q} (u),</math>
| |
| and it holds that
| |
| :<math>\operatorname{E}(X \mid Y) \circ Y= \operatorname{E}\left(X \mid Y^{-1} \left(\Sigma\right)\right).</math>
| |
| | |
| We can further interpret this equality by considering the abstract [[change of variables]] formula to transport the integral on the right hand side to an integral over Ω:
| |
| | |
| :<math> \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{Y^{-1}(B)} (\operatorname{E}(X \mid Y) \circ Y)(\omega) \ d \operatorname{P} (\omega). </math>
| |
| | |
| This equation can be interpreted to say that the following diagram is [[commutative diagram|commutative]] ''in the average''.
| |
| | |
| <!-- Picture is wrong
| |
| :[[Image:Conditional expectation commutative diagram.png|200px|left|A diagram, commutative in an average sense.]]
| |
| <br style="clear:left" /> | |
| -->
| |
| | |
| E(X|Y)= goY
| |
| Ω ───────────────────────────> '''R'''
| |
| Y g=E(X|Y= ·)
| |
| Ω ──────────> '''R''' ───────────> '''R'''
| |
|
| |
| ω ──────────> Y(ω) ───────────> g(Y(ω)) = E(X|Y=Y(ω))
| |
|
| |
| y ───────────> g( y ) = E(X|Y= y )
| |
| | |
| The equation means that the integrals of ''X'' and the composition <math>\operatorname{E}(X \mid Y=\ \cdot)\circ Y</math> over sets of the form ''Y''<sup>−1</sup>(''B''), for ''B'' a measurable subset of ''U'', are identical.
| |
| | |
| == Conditioning relative to a subalgebra ==
| |
| | |
| There is another viewpoint for conditioning involving σ-subalgebras ''N'' of the σ-algebra ''M''. This version is a trivial specialization of the preceding: we simply take ''U'' to be the space Ω with the σ-algebra ''N'' and ''Y'' the identity map. We state the result:
| |
| | |
| '''Theorem'''. If ''X'' is an integrable real random variable on Ω then there is one and, up to equivalence a.e. relative to P, only one integrable function ''g'' such that for any set ''B'' belonging to the subalgebra ''N''
| |
| | |
| :<math> \int_{B} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} g(\omega) \ d \operatorname{P} (\omega) </math>
| |
| | |
| where ''g'' is measurable with respect to ''N'' (a stricter condition than the measurability with
| |
| respect to ''M'' required of ''X'').
| |
| This form of conditional expectation is usually written: E(''X'' | ''N'').
| |
| This version is preferred by probabilists. One reason is that on the [[Hilbert space]] of [[square-integrable]] real random variables (in other words, real random variables with finite second moment) the mapping ''X'' → E(''X'' | ''N'')
| |
| is [[self-adjoint operator|self-adjoint]]
| |
| | |
| :<math>\operatorname E(X\cdot\operatorname E(Y\mid N)) = \operatorname E\left(\operatorname E(X\mid N)\cdot \operatorname E(Y\mid N)\right) = \operatorname E(\operatorname E(X\mid N)\cdot Y)</math>
| |
| | |
| and a [[projection]] (i.e. idempotent)
| |
| :<math> L^2_{\operatorname{P}}(\Omega;M) \rightarrow L^2_{\operatorname{P}}(\Omega;N). </math>
| |
| | |
| == Basic properties ==
| |
| | |
| Let (Ω, ''M'', P) be a probability space, and let ''N'' be a σ-subalgebra of ''M''.
| |
| | |
| * Conditioning with respect to ''N''  is linear on the space of integrable real random variables.
| |
| | |
| * <math>\operatorname{E}(1\mid N) = 1. </math> More generally, <math>\operatorname{E} (Y\mid N)= Y</math> for every integrable ''N''–measurable random variable ''Y'' on Ω.
| |
| | |
| * <math>\operatorname{E}(1_B \,\operatorname{E} (X\mid N))= \operatorname{E}(1_B \, X)</math>   for all ''B'' ∈ ''N'' and every integrable random variable ''X'' on Ω.
| |
| | |
| * [[Jensen's inequality]] holds: If ''ƒ'' is a [[convex function]], then
| |
| | |
| :: <math> f(\operatorname{E}(X \mid N) ) \leq \operatorname{E}(f \circ X \mid N).</math>
| |
| | |
| * Conditioning is a contractive projection
| |
| | |
| ::<math> L^s_P(\Omega; M) \rightarrow L^s_P(\Omega; N), \text{ i.e. } \operatorname{E}|\operatorname{E}(X\mid N)|^s \le \operatorname{E}|X|^s</math>
| |
| :for any ''s'' ≥ 1.
| |
| | |
| ==See also==
| |
| *[[Law of total probability]]
| |
| *[[Law of total expectation]]
| |
| *[[Law of total variance]]
| |
| *[[Law of total cumulance]] (generalizes the other three)
| |
| *[[Conditioning (probability)]]
| |
| *[[Joint probability distribution]]
| |
| *[[Disintegration theorem]]
| |
| | |
| == Notes ==
| |
| {{Reflist}}
| |
| | |
| {{More footnotes|date=November 2010}}
| |
| | |
| == References ==
| |
| {{Refbegin}}
| |
| * {{Cite book |title= Grundbegriffe der Wahrscheinlichkeitsrechnung |last= Kolmogorov |first= Andrey |authorlink= Andrey Kolmogorov |coauthors= |year= 1933 |publisher= Julius Springer |location= Berlin |isbn= |page= |pages= |url= |language= German |ref= kol1933}}{{Page needed|date=December 2010}}
| |
| ** Translation: {{cite book |title= Foundations of the Theory of Probability |edition= 2nd |last= Kolmogorov |first= Andrey |authorlink= Andrey Kolmogorov |coauthors= |year= 1956 |publisher= Chelsea |location= New York |isbn= 0-8284-0023-7 |page= |pages= |url= http://www.mathematik.com/Kolmogorov/index.html |ref= kol1956 }}{{Page needed|date=December 2010}}
| |
| * {{Cite book|last=Loève|first=Michel|authorlink=Michel Loève|title=Probability Theory vol. II |edition=4th |publisher=Springer |year=1978 |isbn=0-387-90262-7 |chapter=Chapter 27. Concept of Conditioning|ref=loe1978}}{{Page needed|date=December 2010}}
| |
| * [[William Feller]], ''An Introduction to Probability Theory and its Applications'', vol 1, 1950 {{Page needed|date=December 2010}}
| |
| * Paul A. Meyer, ''Probability and Potentials'', Blaisdell Publishing Co., 1966 {{Page needed|date=December 2010}}
| |
| * {{Cite book|last1=Grimmett|first1=Geoffrey|authorlink=Geoffrey Grimmett |last2=Stirzaker|first2=David|title=Probability and Random Processes|year=2001|edition=3rd|publisher=Oxford University Press|isbn=0-19-857222-0}}, pages 67-69
| |
| {{Refend}}
| |
| | |
| == External links ==
| |
| * {{Springer |title=Conditional mathematical expectation |id=c/c024500 |first=N.G. |last=Ushakov }}
| |
| | |
| {{DEFAULTSORT:Conditional Expectation}}
| |
| [[Category:Probability theory]]
| |
| [[Category:Statistical terminology]]
| |