# Continuous mapping theorem

In probability theory, the continuous mapping theorem states that continuous functions are limit-preserving even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps convergent sequences into convergent sequences: if xnx then g(xn) → g(x). The continuous mapping theorem states that this will also be true if we replace the deterministic sequence {xn} with a sequence of random variables {Xn}, and replace the standard notion of convergence of real numbers “→” with one of the types of convergence of random variables.

This theorem was first proved by Template:Harv, and it is therefore sometimes called the Mann–Wald theorem.

## Statement

Let {Xn}, X be random elements defined on a metric space S. Suppose a function g: SS′ (where S′ is another metric space) has the set of discontinuity points Dg such that Pr[X ∈ Dg] = 0. Then

## Proof

This proof has been adopted from Template:Harv

Spaces S and S′ are equipped with certain metrics. For simplicity we will denote both of these metrics using the |x−y| notation, even though the metrics may be arbitrary and not necessarily Euclidean.

### Convergence in distribution

We will need a particular statement from the portmanteau theorem: that convergence in distribution $X_{n}\xrightarrow {d} X$ is equivalent to

$\limsup _{n\to \infty }\operatorname {Pr} (X_{n}\in F)\leq \operatorname {Pr} (X\in F){\text{ for every closed set }}F.$ Fix an arbitrary closed set FS′. Denote by g−1(F) the pre-image of F under the mapping g: the set of all points xS such that g(x)∈F. Consider a sequence {xk} such that g(xk)∈F and xkx. Then this sequence lies in g−1(F), and its limit point x belongs to the closure of this set, g−1(F) (by definition of the closure). The point x may be either:

• a continuity point of g, in which case g(xk)→g(x), and hence g(x)∈F because F is a closed set, and therefore in this case x belongs to the pre-image of F, or
• a discontinuity point of g, so that xDg.

Thus the following relationship holds:

${\overline {g^{-1}(F)}}\ \subset \ g^{-1}(F)\cup D_{g}\ .$ Consider the event {g(Xn)∈F}. The probability of this event can be estimated as

$\operatorname {Pr} {\big (}g(X_{n})\in F{\big )}=\operatorname {Pr} {\big (}X_{n}\in g^{-1}(F){\big )}\leq \operatorname {Pr} {\big (}X_{n}\in {\overline {g^{-1}(F)}}{\big )},$ and by the portmanteau theorem the limsup of the last expression is less than or equal to Pr(Xg−1(F)). Using the formula we derived in the previous paragraph, this can be written as

{\begin{aligned}&\operatorname {Pr} {\big (}X\in {\overline {g^{-1}(F)}}{\big )}\leq \operatorname {Pr} {\big (}X\in g^{-1}(F)\cup D_{g}{\big )}\leq \\&\operatorname {Pr} {\big (}X\in g^{-1}(F){\big )}+\operatorname {Pr} (X\in D_{g})=\operatorname {Pr} {\big (}g(X)\in F{\big )}+0.\end{aligned}} On plugging this back into the original expression, it can be seen that

$\limsup _{n\to \infty }\operatorname {Pr} {\big (}g(X_{n})\in F{\big )}\leq \operatorname {Pr} {\big (}g(X)\in F{\big )},$ which, by the portmanteau theorem, implies that g(Xn) converges to g(X) in distribution.

### Convergence in probability

Fix an arbitrary ε>0. Then for any δ>0 consider the set Bδ defined as

$B_{\delta }={\big \{}x\in S\ {\big |}\ x\notin D_{g}:\ \exists y\in S:\ |x-y|<\delta ,\,|g(x)-g(y)|>\varepsilon {\big \}}.$ This is the set of continuity points x of the function g(·) for which it is possible to find, within the δ-neighborhood of x, a point which maps outside the ε-neighborhood of g(x). By definition of continuity, this set shrinks as δ goes to zero, so that limδ→0Bδ = ∅.

Now suppose that |g(X) − g(Xn)| > ε. This implies that at least one of the following is true: either |XXn|≥δ, or XDg, or XBδ. In terms of probabilities this can be written as

$\operatorname {Pr} {\big (}{\big |}g(X_{n})-g(X){\big |}>\varepsilon {\big )}\leq \operatorname {Pr} {\big (}|X_{n}-X|\geq \delta {\big )}+\operatorname {Pr} (X\in B_{\delta })+\operatorname {Pr} (X\in D_{g}).$ On the right-hand side, the first term converges to zero as n → ∞ for any fixed δ, by the definition of convergence in probability of the sequence {Xn}. The second term converges to zero as δ → 0, since the set Bδ shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore the conclusion is that

$\lim _{n\to \infty }\operatorname {Pr} {\big (}{\big |}g(X_{n})-g(X){\big |}>\varepsilon {\big )}=0,$ which means that g(Xn) converges to g(X) in probability.

### Convergence almost surely

By definition of the continuity of the function g(·),

$\lim _{n\to \infty }X_{n}(\omega )=X(\omega )\quad \Rightarrow \quad \lim _{n\to \infty }g(X_{n}(\omega ))=g(X(\omega ))$ at each point X(ω) where g(·) is continuous. Therefore

{\begin{aligned}\operatorname {Pr} {\Big (}\lim _{n\to \infty }g(X_{n})=g(X){\Big )}&\geq \operatorname {Pr} {\Big (}\lim _{n\to \infty }g(X_{n})=g(X),\ X\notin D_{g}{\Big )}\\&\geq \operatorname {Pr} {\Big (}\lim _{n\to \infty }X_{n}=X,\ X\notin D_{g}{\Big )}\\&\geq \operatorname {Pr} {\Big (}\lim _{n\to \infty }X_{n}=X{\Big )}-\operatorname {Pr} (X\in D_{g})=1-0=1.\end{aligned}} By definition, we conclude that g(Xn) converges to g(X) almost surely.