Simplectic honeycomb: Difference between revisions
en>Tomruen {{Honeycombs}} |
en>Tetracube m add css spacing for diagram, for better readability |
||
Line 1: | Line 1: | ||
I | For certain applications in [[linear algebra]], it is useful to know properties of the [[probability distribution]] of the largest [[eigenvalue]] of a [[Matrix addition#Entrywise sum|finite sum]] of [[Random matrix|random matrices]]. Suppose <math>\{\mathbf{X}_k\}</math> is a finite sequence of random matrices. Analogous to the well-known [[Chernoff bound]] for sums of scalars, a bound on the following is sought for a given parameter ''t'': | ||
:<math> \Pr \left \{ \lambda_\max \left ( \sum_k \mathbf{X}_k \right ) \geq t \right \}</math> | |||
The following theorems answer this general question under various assumptions; these assumptions are named below by analogy to their classical, scalar counterparts. All of these theorems can be found in {{Harv|Tropp|2010}}, as the specific application of a general result which is derived below. A summary of related works is given. | |||
==Matrix Gaussian and Rademacher series== | |||
===Self-adjoint matrices case=== | |||
Consider a finite sequence <math>\{\mathbf{A}_k\}</math> of fixed, | |||
self-adjoint matrices with dimension <math>d</math>, and let <math>\{\xi_k\}</math> be a finite sequence of [[Independence (probability theory)|independent]] standard [[Normal distribution|normal]] or independent [[Rademacher distribution|Rademacher]] random variables. | |||
Then, for all <math>t\geq0</math>, | |||
:<math> | |||
\Pr \left\{ \lambda_{\text{max}} \left( \sum_k \xi_k \mathbf{A}_k \right) \geq t \right\} \leq d \cdot e^{-t^2/2\sigma^2} | |||
</math> | |||
where | |||
:<math> | |||
\sigma^2 = \bigg\Vert \sum_k \mathbf{A}^2_k \bigg\Vert. | |||
</math> | |||
===Rectangular case=== | |||
Consider a finite sequence <math>\{\mathbf{B}_k\}</math> of fixed, self-adjoint matrices with dimension <math>d_1\times d_2</math>, and let <math>\{\xi_k\}</math> be a finite sequence of independent standard normal or independent Rademacher random variables. | |||
Define the variance parameter | |||
:<math> | |||
\sigma^2 = \max \left\{ \bigg\Vert \sum_k \mathbf{B}_k\mathbf{B}_k^* \bigg\Vert, \bigg\Vert \sum_k \mathbf{B}_k^*\mathbf{B}_k \bigg\Vert \right\}. | |||
</math> | |||
Then, for all <math>t\geq0</math>, | |||
:<math> | |||
\Pr \left\{ \bigg\Vert \sum_k \xi_k \mathbf{B}_k \bigg\Vert \geq t \right\} \leq (d_1+d_2) \cdot e^{-t^2/2\sigma^2}. | |||
</math> | |||
==Matrix Chernoff inequalities== | |||
The classical [[Chernoff bounds]] concerns the sum of independent, nonnegative, and uniformly bounded random variables. | |||
In the matrix setting, the analogous theroem concerns a sum of [[positive-semidefinite matrix|positive-semidefinite]] random matrices subjected to a uniform eigenvalue bound. | |||
===Matrix Chernoff I=== | |||
Consider a finite sequence <math>\{\mathbf{X}_k\}</math> of independent, random, self-adjoint matrices with dimension <math>d</math>. | |||
Assume that each random matrix satisfies | |||
:<math> | |||
\mathbf{X}_k \succeq \mathbf{0} \quad \text{and} \quad \lambda_{\text{max}}(\mathbf{X}_k) \leq R | |||
</math> | |||
almost surely. | |||
Define | |||
:<math> | |||
\mu_{\text{min}} = \lambda_{\text{min}}\left( \sum_k \mathbb{E}\, \mathbf{X}_k \right) \quad \text{and} \quad | |||
\mu_{\text{max}} = \lambda_{\text{max}}\left( \sum_k \mathbb{E}\, \mathbf{X}_k \right). | |||
</math> | |||
Then | |||
:<math> | |||
\Pr \left\{ \lambda_{\text{min}}\left( \sum_k \mathbf{X}_k \right) \leq (1-\delta)\mu_{\text{min}} \right\} \leq d \cdot \left[ \frac{e^{-\delta}}{(1-\delta)^{1-\delta}} \right]^{\mu_{\text{min}}/R} \quad \text{for } \delta\in [0,1]\text{, and} | |||
</math> | |||
:<math> | |||
\Pr \left\{ \lambda_{\text{max}}\left( \sum_k \mathbf{X}_k \right) \geq (1+\delta)\mu_{\text{max}} \right\} \leq d \cdot \left[ \frac{e^{\delta}}{(1+\delta)^{1+\delta}} \right]^{\mu_{\text{max}}/R} \quad \text{for } \delta \geq 0. | |||
</math> | |||
===Matrix Chernoff II=== | |||
Consider a sequence <math>\{\mathbf{X}_k:k=1,2,\ldots,n\}</math> of independent, random, self-adjoint matrices that satisfy | |||
:<math> | |||
\mathbf{X}_k \succeq \mathbf{0} \quad \text{and} \quad \lambda_{\text{max}}(\mathbf{X}_k) \leq 1 | |||
</math> | |||
almost surely. | |||
Compute the minimum and maximum eigenvalues of the average expectation, | |||
:<math> | |||
\bar{\mu}_{\text{min}} = \lambda_{\text{min}}\left( \frac{1}{n} \sum_{k=1}^n \mathbb{E}\, \mathbf{X}_k \right) \quad \text{and} \quad | |||
\bar{\mu}_{\text{max}} = \lambda_{\text{max}}\left( \frac{1}{n} \sum_{k=1}^n \mathbb{E}\, \mathbf{X}_k \right). | |||
</math> | |||
Then | |||
:<math> | |||
\Pr \left\{ \lambda_{\text{min}}\left( \frac{1}{n} \sum_{k=1}^n \mathbf{X}_k \right) \leq \alpha \right\} \leq d \cdot e^{-nD(\alpha \Vert \bar{\mu}_{\text{min}})} \quad \text{for } 0 \leq \alpha \leq \bar{\mu}_{\text{min}}\text{, and} | |||
</math> | |||
:<math> | |||
\Pr \left\{ \lambda_{\text{max}}\left( \frac{1}{n} \sum_{k=1}^n \mathbf{X}_k \right) \geq \alpha \right\} \leq d \cdot e^{-nD(\alpha \Vert \bar{\mu}_{\text{max}})} \quad \text{for } \bar{\mu}_{\text{max}} \leq \alpha \leq 1. | |||
</math> | |||
The binary information divergence is defined as | |||
:<math> | |||
D(a\Vert u) = a \left( \log a - \log u \right) + (1-a)\left( \log(1-a)-\log(1-u) \right) | |||
</math> | |||
for <math>a,u\in[0,1]</math>. | |||
==Matrix Bennett and Bernstein inequalities== | |||
In the scalar setting, [[Bernstein inequalities (probability theory)|Bennett and Bernstein inequalities]] describe the upper tail of a sum of independent, zero-mean random variables that are either bounded or [[Heavy-tailed distribution#Subexponential distributions|subexponential]]. In the matrix | |||
case, the analogous results concern a sum of zero-mean random matrices. | |||
===Bounded case=== | |||
Consider a finite sequence <math>\{\mathbf{X}_k\}</math> of independent, random, self-adjoint matrices with dimension <math>d</math>. | |||
Assume that each random matrix satisfies | |||
:<math> | |||
\mathbf{X}_k \succeq \mathbf{0} \quad \text{and} \quad \lambda_{\text{max}}(\mathbf{X}_k) \leq R | |||
</math> | |||
almost surely. | |||
Compute the norm of the total variance, | |||
:<math> | |||
\sigma^2 = \bigg\Vert \sum_k \mathbb{E}\, (\mathbf{X}^2_k) \bigg\Vert. | |||
</math> | |||
Then, the following chain of inequalities holds for all <math> t \geq 0</math>: | |||
:<math> | |||
\begin{align} | |||
\Pr \left\{ \lambda_{\text{max}} \left( \sum_k \mathbf{X}_k \right) \geq t \right\} | |||
& \leq d \cdot \exp \left( -\frac{\sigma^2}{R^2} \cdot h\left( \frac{Rt}{\sigma^2} \right) \right) \\ | |||
& \leq d \cdot \exp \left( \frac{-t^2} {\sigma^2+Rt/3} \right) \\ | |||
& \leq | |||
\begin{cases} | |||
d \cdot \exp ( -3t^2/8\sigma^2 ) \quad & \text{for } t\leq \sigma^2/R; \\ | |||
d \cdot \exp ( -3t/8R ) \quad & \text{for } t\geq \sigma^2/R. \\ | |||
\end{cases} | |||
\end{align} | |||
</math> | |||
The function <math> h(u) </math> is defined as <math> h(u) = (1+u) \log (1+u)-u </math> for <math>u \geq 0</math>. | |||
===Subexponential case=== | |||
Consider a finite sequence <math>\{\mathbf{X}_k\}</math> of independent, random, self-adjoint matrices with dimension <math>d</math>. | |||
Assume that | |||
:<math> | |||
\mathbb{E}\,\mathbf{X}_k = \mathbf{0} \quad \text{and} \quad \mathbb{E}\,(\mathbf{X}_k^p) \preceq \frac{p!}{2}\cdot R^{p-2} \mathbf{A}_k^2 | |||
</math> | |||
for <math>p=2,3,4,\ldots</math>. | |||
Compute the variance parameter, | |||
:<math> | |||
\sigma^2 = \bigg\Vert \sum_k \mathbf{A}^2_k \bigg\Vert. | |||
</math> | |||
Then, the following chain of inequalities holds for all <math> t \geq 0</math>: | |||
:<math> | |||
\begin{align} | |||
\Pr \left\{ \lambda_{\text{max}} \left( \sum_k \mathbf{X}_k \right) \geq t \right\} | |||
& \leq d \cdot \exp \left( \frac{-t^2/2}{\sigma^2+Rt} \right) \\ | |||
& \leq | |||
\begin{cases} | |||
d \cdot \exp ( -t^2/4\sigma^2 ) \quad & \text{for } t\leq \sigma^2/R; \\ | |||
d \cdot \exp ( -t/4R ) \quad & \text{for } t\geq \sigma^2/R. \\ | |||
\end{cases} | |||
\end{align} | |||
</math> | |||
===Rectangular case=== | |||
Consider a finite sequence <math>\{\mathbf{Z}_k\}</math> of independent, random, matrices with dimension <math>d_1\times d_2</math>. | |||
Assume that each random matrix satisfies | |||
:<math> | |||
\mathbb{E}\,\mathbf{Z}_k = \mathbf{0} \quad \text{and} \quad \Vert \mathbf{Z}_k \Vert \leq R | |||
</math> | |||
almost surely. | |||
Define the variance parameter | |||
:<math> | |||
\sigma^2 = \max \left\{ \bigg\Vert \sum_k \mathbb{E}\,(\mathbf{Z}_k\mathbf{Z}_k^*) \bigg\Vert, \bigg\Vert \sum_k \mathbb{E}\, (\mathbf{Z}_k^*\mathbf{Z}_k) \bigg\Vert \right\}. | |||
</math> | |||
Then, for all <math> t \geq 0</math> | |||
:<math> | |||
\Pr \left\{ \bigg\Vert \sum_k \mathbf{Z}_k \bigg\Vert \geq t \right\} \leq (d_1+d_2) \cdot \exp \left( \frac{-t^2} {\sigma^2+Rt/3} \right) | |||
</math> | |||
==Matrix Azuma, Hoeffding, and McDiarmid inequalities== | |||
===Matrix Azuma=== | |||
The scalar version of [[Azuma's inequality]] states that a scalar [[Martingale (probability theory)|martingale]] exhibits normal concentration about its mean value, and the scale for deviations is controlled by the total maximum squared range of the difference sequence. | |||
The following is the extension in matrix setting. | |||
Consider a finite adapted sequence <math>\{\mathbf{X}_k\}</math> of self-adjoint matrices with dimension <math>d</math>, and a fixed sequence <math>\{\mathbf{A}_k\}</math> of self-adjoint matrices that satisfy | |||
:<math> | |||
\mathbb{E}_{k-1}\,\mathbf{X}_k = \mathbf{0} \quad \text{and} \quad \mathbf{X}_k^2 \preceq \mathbf{A}_k^2 | |||
</math> | |||
almost surely. | |||
Compute the variance parameter | |||
:<math> | |||
\sigma^2 = \bigg\Vert \sum_k \mathbf{A}^2_k \bigg\Vert. | |||
</math> | |||
Then, for all <math> t \geq 0</math> | |||
:<math> | |||
\Pr \left\{ \lambda_{\text{max}} \left( \sum_k \mathbf{X}_k \right) \geq t \right\} \leq d \cdot e^{-t^2/8\sigma^2} | |||
</math> | |||
The constant 1/8 can be improved to 1/2 when there is additional information available. One case occurs when each summand <math> \mathbf{X}_k </math> is conditionally symmetric. | |||
Another example requires the assumption that <math> \mathbf{X}_k </math> commutes almost surely with <math> \mathbf{A}_k </math>. | |||
===Matrix Hoeffding=== | |||
Placing addition assumption that the summands in Matrix Azuma are independent gives a matrix extension of [[Hoeffding's inequality|Hoeffding's inequalities]]. | |||
Consider a finite sequence <math>\{\mathbf{X}_k\}</math> of independent, random, self-adjoint matrices with dimension <math>d</math>, and let <math>\{\mathbf{A}_k\}</math> be a sequence of fixed self-adjoint matrices. | |||
Assume that each random matrix satisfies | |||
:<math> | |||
\mathbb{E}\,\mathbf{X}_k = \mathbf{0} \quad \text{and} \quad \mathbf{X}_k^2 \preceq \mathbf{A}_k^2 | |||
</math> | |||
almost surely. | |||
Then, for all <math> t \geq 0</math> | |||
:<math> | |||
\Pr \left\{ \lambda_{\text{max}} \left( \sum_k \mathbf{X}_k \right) \geq t \right\} \leq d \cdot e^{-t^2/8\sigma^2} | |||
</math> | |||
where | |||
:<math> | |||
\sigma^2 = \bigg\Vert \sum_k \mathbf{A}^2_k \bigg\Vert. | |||
</math> | |||
===Matrix bounded difference (McDiarmid)=== | |||
In scalar setting, [[McDiarmid's inequality#McDiarmid's inequality|McDiarmid's inequality]] provides one common way of bounding the differences by applying [[Azuma's inequality]] to a [[Doob martingale]]. A version of the bounded differences inequality holds in the matrix setting. | |||
Let <math>\{\mathbf{Z}_k:k=1,2,\ldots,n\}</math> be an independent, family of random variables, and let <math>\mathbf{H}</math> be a function that maps <math>n</math> variables to a self-adjoint matrix of dimension <math>d</math>. | |||
Consider a sequence <math>\{\mathbf{A}_k\}</math> of fixed self-adjoint matrices that satisfy | |||
:<math> | |||
\left( \mathbf{H}(z_1,\ldots,z_k,\ldots,z_n) - \mathbf{H}(z_1,\ldots,z'_k,\ldots,z_n) \right)^2 \preceq \mathbf{A}_k^2, | |||
</math> | |||
where <math>z_i</math> and <math>z'_i</math> range over all possible values of <math>Z_i</math> for each index <math>i</math>. | |||
Compute the variance parameter | |||
:<math> | |||
\sigma^2 = \bigg\Vert \sum_k \mathbf{A}^2_k \bigg\Vert. | |||
</math> | |||
Then, for all <math> t \geq 0</math> | |||
:<math> | |||
\Pr \left\{ \lambda_{\text{max}} \left( \mathbf{H}(\mathbf{z}) - \mathbb{E}\,\mathbf{H}(\mathbf{z}) \right) \geq t \right\} \leq d \cdot e^{-t^2/8\sigma^2}, | |||
</math> | |||
where <math>\mathbf{z}=(Z_1,\ldots,Z_n)</math>. | |||
==Survey of related theorems== | |||
The first bounds of this type were derived by {{Harv|Ahlswede| Winter|2003}}. Recall the [[#Matrix Gaussian and Rademacher series|theorem above for self-adjoint matrix Gaussian and Rademacher bounds]]: | |||
For a finite sequence <math>\{\mathbf{A}_k\}</math> of fixed, | |||
self-adjoint matrices with dimension <math>d</math> and for <math>\{\xi_k\}</math> a finite sequence of [[Independence (probability theory)|independent]] standard [[Normal distribution|normal]] or independent [[Rademacher distribution|Rademacher]] random variables, then | |||
:<math> | |||
\Pr \left\{ \lambda_{\text{max}} \left( \sum_k \xi_k \mathbf{A}_k \right) \geq t \right\} \leq d \cdot e^{-t^2/2\sigma^2} | |||
</math> | |||
where | |||
:<math> | |||
\sigma^2 = \bigg\Vert \sum_k \mathbf{A}^2_k \bigg\Vert. | |||
</math> | |||
Ahlswede and Winter would give the same result, except with | |||
:<math> \sigma_{AW}^2 = \sum_k \lambda_\max \left(\mathbf{A}_k^2 \right ) </math>. | |||
By comparison, the <math>\sigma^2</math> in the theorem above commutes <math>\Sigma</math> and <math> \lambda_\max</math>; that is, it is the largest eigenvalue of the sum rather than the sum of the largest eigenvalues. It is never larger than the Ahlswede–Winter value (by the [[Matrix norm|norm]] [[triangle inequality]]), but can be much smaller. Therefore, the theorem above gives a tighter bound than the Ahlswede–Winter result. | |||
The chief contribution of {{Harv|Ahlswede|Winter|2003}} was the extension of the Laplace-transform method used to prove the scalar Chernoff bound (see [[Chernoff bound#Theorem for additive form (absolute error)]]) to the case of self-adjoint matrices. The procedure given in the [[#Derivation and proof|derivation]] below. All of the recent works on this topic follow this same procedure, and the chief differences follow from subsequent steps. Ahlswede & Winter use the [[Golden–Thompson inequality]] to proceed, whereas Tropp {{Harv|Tropp|2010}} uses [[Matrix exponential#Lieb's theorem|Lieb's Theorem]]. | |||
Suppose one wished to vary the length of the series (''n'') and the dimensions of the | |||
matrices (''d'') while keeping the right-hand side approximately constant. Then | |||
n must vary approximately as the log of ''d''. Several papers have attempted to establish a bound without a dependence on dimensions. Rudelson and Vershynin {{Harv|Rudelson|Vershynin|2007}} give a result for matrices which are the outer product of two vectors. {{Harv|Magen|Zouzias|2010}} [[Chernoff bound#Theorem without the dependency on the dimensions|provide a result without the dimensional dependence for low rank matrices]]. The original result was derived independently from the Ahlswede–Winter approach, but {{Harv|Oliveira|2010b}} proves a similar result using the Ahlswede–Winter approach. | |||
Finally, Oliveira {{Harv|Oliviera|2010a}} proves a result for matrix martingales independently from the Ahlswede–Winter framework. Tropp {{Harv|Tropp|2011}} slightly improves on the result using the Ahlswede–Winter framework. Neither result is presented in this article. | |||
==Derivation and proof== | |||
===Ahlswede and Winter=== | |||
The Laplace transform argument found in {{Harv|Ahlswede|Winter|2003}} is a significant result in its own right: | |||
Let <math>\mathbf{Y}</math> be a random self-adjoint matrix. Then | |||
:<math> \Pr \left \{ \lambda_\max ( Y ) \geq t \right \} \leq \inf_{\theta > 0} | |||
\left \{ e^{-\theta t } \cdot \operatorname{E} \left [ \operatorname{tr} e^{\theta \mathbf{Y}} \right ] \right \}. | |||
</math> | |||
To prove this, fix <math>\theta > 0</math>. Then | |||
:<math> \begin{align} \Pr \left \{ \lambda_\max (\mathbf{Y}) \geq t \right \} &= \Pr \left \{ \lambda_{\max} (\mathbf{\theta Y}) \geq \theta t \right \} \\ | |||
&= \Pr \left \{ e^{\lambda_{\max} (\theta \mathbf{Y})} \geq e^{\theta t} \right \}\\ | |||
&\leq e^{-\theta t} \operatorname{E} e^{\lambda_{\max} (\theta \mathbf{Y})}\\ | |||
&\leq e^{-\theta t} \operatorname{E} \operatorname{tr} e^{(\theta \mathbf{Y})} | |||
\end{align} | |||
</math> | |||
The second-to-last inequality is [[Markov's inequality]]. The last inequality holds since <math> e^{\lambda_{\max} \theta \mathbf{Y}} = \lambda_{\max} e^{\theta \mathbf{Y}} \leq \operatorname{tr} e^{\theta \mathbf{Y}}</math>. Since the left-most quantity is independent of <math>\theta</math>, the infimum over <math>\theta>0</math> remains an upper bound for it. | |||
Thus, our task is to understand <math>\operatorname{E} \operatorname{tr} e^{\theta \mathbf{Y}}</math> Nevertheless, since trace and expectation are both linear, we can commute them, so it is sufficient to consider <math>\operatorname{E} e^{\theta \mathbf{Y}} := \mathbf{M}_\mathbf{Y}(\theta)</math>, which we call the matrix generating function. This is where the methods of {{Harv|Ahlswede|Winter|2003}} and {{Harv|Tropp|2010}} diverge. The immediately following presentation follows {{Harv|Ahlswede|Winter|2003}}. | |||
The [[Golden–Thompson inequality]] implies that | |||
:<math> \operatorname{tr} \mathbf{M}_{\mathbf{X}_1 + \mathbf{X}_2}(\theta) \leq \operatorname{tr} \left [ \left ( \operatorname{E} e^{\theta \mathbf{X}_1} \right ) | |||
\left ( \operatorname{E} e^{\theta \mathbf{X}_2} \right ) \right ] = | |||
\operatorname{tr} \mathbf{M}_{\mathbf{X}_1} (\theta) \mathbf{M}_{\mathbf{X}_2}(\theta) </math>, where we used the linearity of expectation several times. | |||
Suppose <math>\mathbf{Y} = \sum_k \mathbf{X}_k</math>. We can find an upper bound for <math>\operatorname{tr} \mathbf{M}_{\mathbf{Y}} (\theta)</math> by iterating this result. Noting that <math>\operatorname{tr}(\mathbf{AB}) \leq \operatorname{tr}(\mathbf{A}) \lambda_{\max}(\mathbf{B})</math>, then | |||
:<math> \operatorname{tr} \mathbf{M}_\mathbf{Y} (\theta) \leq | |||
\operatorname{tr} \left [ \left ( \operatorname{E} e^{\sum_{k=1}^{n-1} \theta \mathbf{X}_k} \right ) \left( \operatorname{E} e^{\theta \mathbf{X}_n} \right ) \right ] | |||
\leq \operatorname{tr} \left ( \operatorname{E} e^{\sum_{k=1}^{n-1} \theta \mathbf{X}_k} \right ) \lambda_{\max} ( \operatorname{E} e^{\theta \mathbf{X}_n}). | |||
</math> | |||
Iterating this, we get | |||
:<math> \operatorname{tr} \mathbf{M}_\mathbf{Y} (\theta) \leq | |||
(\operatorname{tr} \mathbf{I}) \left [ \Pi_k \lambda_\max (\operatorname{E} e^{\theta \mathbf{X}_k}) \right ] = | |||
d e^{\sum_k \lambda_\max \left ( \log \operatorname{E} e^{\theta \mathbf{X}_k} \right ) } </math> | |||
So far we have found a bound with an infimum over <math>\theta</math>. In turn, this can be bounded. At any rate, one can see how the Ahlswede–Winter bound arises as the sum of largest eigenvalues. | |||
===Tropp=== | |||
The major contribution of {{Harv|Tropp|2010}} is the application of [[Matrix exponential#Lieb's theorem|Lieb's theorem]] where {{Harv|Ahlswede|Winter|2003}} had applied the [[Golden–Thompson inequality]]. Tropp's corollary is the following: If <math>H</math> is a fixed self-adjoint matrix and <math>X</math> is a random self-adjoint matrix, then | |||
:<math> \operatorname{E} \operatorname{tr} e^{\mathbf{H}+\mathbf{X}} | |||
\leq \operatorname{tr} e^{\mathbf{H} + \log( \operatorname{E} e^{\mathbf{X} })} </math> | |||
Proof: Let <math> \mathbf{Y} = e^\mathbf{X}</math>. Then Lieb's theorem tells us that | |||
:<math> f(\mathbf{Y}) = \operatorname{tr} e^{\mathbf{H} + \log(\mathbf{Y})} </math> | |||
is concave. | |||
The final step is to use [[Jensen's inequality]] to move the expectation inside the function: | |||
:<math> \operatorname{E} \operatorname{tr} e^{\mathbf{H} + \log(\mathbf{Y})} | |||
\leq \operatorname{tr} e^{\mathbf{H} + \log(\operatorname{E} \mathbf{Y})}. </math> | |||
This gives us the major result of the paper: the subadditivity of the log of the matrix generating function. | |||
====Subadditivity of log mgf==== | |||
Let <math>\mathbf{X}_k</math> be a finite sequence of independent, random self-adjoint matrices. Then for all <math>\theta \in \mathbb{R}</math>, | |||
:<math> \operatorname{tr} \mathbf{M}_{\sum_k \mathbf{X}_k} (\theta) | |||
\leq \operatorname{tr} e^{\sum_k \log \mathbf{M}_{\mathbf{X}_k} (\theta)} </math> | |||
Proof: It is sufficient to let <math>\theta=1</math>. Expanding the definitions, we need to show that | |||
:<math> \operatorname{E} \operatorname{tr} e^{\sum_k \theta \mathbf{X}_k } | |||
\leq \operatorname{tr} e^{\sum_k \log \operatorname{E} e^{\theta \mathbf{X}_k} }. </math> | |||
To complete the proof, we use the [[law of total expectation]]. Let <math>\operatorname{E}_k</math> be the expectation conditioned on <math> \mathbf{X}_1, \ldots, \mathbf{X}_k </math>. Since we assume all the <math>\mathbf{X}_i</math> are independent, | |||
:<math> \operatorname{E}_{k-1} e^{\mathbf{X}_k} = \operatorname{E} e^{\mathbf{X}_k}. </math> | |||
Define <math>\mathbf{\Xi}_k = \log \operatorname{E}_{k-1} e^{\mathbf{X}_k} = \log \mathbf{M}_{\mathbf{X}_k}(\theta) </math>. | |||
Finally, we have | |||
:<math> \begin{align} | |||
\operatorname{E} \operatorname{tr} e^{\sum_{k=1}^n \mathbf{X}_k} & = \operatorname{E}_0 \cdots \operatorname{E}_{n-1} \operatorname{tr} e^{\sum_{k=1}^{n-1} \mathbf{X}_k + \mathbf{X}_n }\\ | |||
&\leq \operatorname{E}_0 \cdots \operatorname{E}_{n-2} \operatorname{tr} e^{\sum_{k=1}^{n-1} \mathbf{X}_k + \log(\operatorname{E}_{n-1} e^{\mathbf{X}_n} ) }\\ | |||
&= \operatorname{E}_0 \cdots \operatorname{E}_{n-2} \operatorname{tr} e^{\sum_{k=1}^{n-2} \mathbf{X}_k + \mathbf{X}_{n-1} + \mathbf{\Xi}_n} \\ | |||
& \vdots\\ | |||
& = \operatorname{tr} e^{\sum_{k=1}^n \mathbf{\Xi}_k} | |||
\end{align} </math> | |||
where at every step m we use Tropp's corollary with | |||
:<math> \mathbf{H}_m = \sum_{k=1}^{m-1} \mathbf{X}_k + \sum_{k=m+1}^n \mathbf{\Xi}_k </math> | |||
====Master tail bound==== | |||
The following is immediate from the previous result: | |||
:<math> | |||
\Pr \left \{ \lambda_\max \left ( \sum_k \mathbf{X}_k \right ) \geq t \right \} | |||
\leq \inf_{\theta > 0} \left \{ e^{-\theta t} \operatorname{tr} e^{\sum_k \log \mathbf{M}_{\mathbf{X}_k} (\theta) } \right \} | |||
</math> | |||
All of the theorems given above are derived from this bound; the theorems consist in various ways to bound the infimum. These steps are significantly simpler than the proofs given. | |||
==References== | |||
* {{cite journal | |||
|last1=Ahlswede |first1=R. | |||
|last2=Winter |first2=A. | |||
|year=2003 | |||
|title=Strong Converse for Identification via Quantum Channels | |||
|volume=48 |issue=3 |pages=569–579 | |||
|journal=[[IEEE Transactions on Information Theory]] | |||
|arxiv=quant-ph/0012127 | |||
|doi= | |||
|ref=harv | |||
}} | |||
* {{cite arxiv | |||
|last1=Magen | |||
|first1=A. | |||
|last2=Zouzias | |||
|first2=A. | |||
|title=Low-Rank Matrix-valued Chernoff Bounds and Approximate Matrix Multiplication | |||
|year=2010 | |||
|class=math | |||
|eprint=1005.2724 | |||
|ref=harv | |||
}} | |||
* {{cite arxiv | |||
|last1=Oliveira | |||
|first1=R.I. | |||
|year=2010 | |||
|title=Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges | |||
|class=math | |||
|eprint=0911.0600 | |||
|ref=harv | |||
}} | |||
* {{cite arxiv | |||
|last1=Oliveira | |||
|first1=R.I. | |||
|year=2010 | |||
|title=Sums of random Hermitian matrices and an inequality by Rudelson | |||
|class=math | |||
|eprint=1004.3821 | |||
|ref=harv | |||
}} | |||
* {{cite journal | |||
|last1=Rudelson | |||
|first1=M. | |||
|last2=Vershynin | |||
|first2=R. | |||
|title=Sampling from large matrices: an approach through geometric functional analysis | |||
|journal=J. Assoc. Comput. Mach. | |||
|volume=54 | |||
|edition=4 | |||
|article=21 | |||
|year=2007 | |||
|arxiv=math/9608208 | |||
|ref=harv | |||
}} | |||
* {{cite arxiv | |||
|last1=Tropp | |||
|first1=J. | |||
|year=2011 | |||
|title=Freedman's inequality for matrix martingales | |||
|class=math | |||
|eprint=1101.3039 | |||
|ref=harv | |||
}} | |||
* {{cite arxiv | |||
|last1=Tropp |first1=J. | |||
|year=2010 | |||
|title=User-friendly tail bounds for sums of random matrices | |||
|class=math | |||
|eprint=1004.4389 | |||
|ref=harv | |||
}} | |||
[[Category:Linear algebra]] |
Latest revision as of 21:48, 17 December 2013
For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t:
The following theorems answer this general question under various assumptions; these assumptions are named below by analogy to their classical, scalar counterparts. All of these theorems can be found in Template:Harv, as the specific application of a general result which is derived below. A summary of related works is given.
Matrix Gaussian and Rademacher series
Self-adjoint matrices case
Consider a finite sequence of fixed, self-adjoint matrices with dimension , and let be a finite sequence of independent standard normal or independent Rademacher random variables.
where
Rectangular case
Consider a finite sequence of fixed, self-adjoint matrices with dimension , and let be a finite sequence of independent standard normal or independent Rademacher random variables. Define the variance parameter
Matrix Chernoff inequalities
The classical Chernoff bounds concerns the sum of independent, nonnegative, and uniformly bounded random variables. In the matrix setting, the analogous theroem concerns a sum of positive-semidefinite random matrices subjected to a uniform eigenvalue bound.
Matrix Chernoff I
Consider a finite sequence of independent, random, self-adjoint matrices with dimension . Assume that each random matrix satisfies
almost surely.
Define
Then
Matrix Chernoff II
Consider a sequence of independent, random, self-adjoint matrices that satisfy
almost surely.
Compute the minimum and maximum eigenvalues of the average expectation,
Then
The binary information divergence is defined as
Matrix Bennett and Bernstein inequalities
In the scalar setting, Bennett and Bernstein inequalities describe the upper tail of a sum of independent, zero-mean random variables that are either bounded or subexponential. In the matrix case, the analogous results concern a sum of zero-mean random matrices.
Bounded case
Consider a finite sequence of independent, random, self-adjoint matrices with dimension . Assume that each random matrix satisfies
almost surely.
Compute the norm of the total variance,
Then, the following chain of inequalities holds for all :
The function is defined as for .
Subexponential case
Consider a finite sequence of independent, random, self-adjoint matrices with dimension . Assume that
Compute the variance parameter,
Then, the following chain of inequalities holds for all :
Rectangular case
Consider a finite sequence of independent, random, matrices with dimension . Assume that each random matrix satisfies
almost surely. Define the variance parameter
Matrix Azuma, Hoeffding, and McDiarmid inequalities
Matrix Azuma
The scalar version of Azuma's inequality states that a scalar martingale exhibits normal concentration about its mean value, and the scale for deviations is controlled by the total maximum squared range of the difference sequence. The following is the extension in matrix setting.
Consider a finite adapted sequence of self-adjoint matrices with dimension , and a fixed sequence of self-adjoint matrices that satisfy
almost surely.
Compute the variance parameter
The constant 1/8 can be improved to 1/2 when there is additional information available. One case occurs when each summand is conditionally symmetric. Another example requires the assumption that commutes almost surely with .
Matrix Hoeffding
Placing addition assumption that the summands in Matrix Azuma are independent gives a matrix extension of Hoeffding's inequalities.
Consider a finite sequence of independent, random, self-adjoint matrices with dimension , and let be a sequence of fixed self-adjoint matrices. Assume that each random matrix satisfies
almost surely.
where
Matrix bounded difference (McDiarmid)
In scalar setting, McDiarmid's inequality provides one common way of bounding the differences by applying Azuma's inequality to a Doob martingale. A version of the bounded differences inequality holds in the matrix setting.
Let be an independent, family of random variables, and let be a function that maps variables to a self-adjoint matrix of dimension . Consider a sequence of fixed self-adjoint matrices that satisfy
where and range over all possible values of for each index . Compute the variance parameter
The first bounds of this type were derived by Template:Harv. Recall the theorem above for self-adjoint matrix Gaussian and Rademacher bounds: For a finite sequence of fixed, self-adjoint matrices with dimension and for a finite sequence of independent standard normal or independent Rademacher random variables, then
where
Ahlswede and Winter would give the same result, except with
By comparison, the in the theorem above commutes and ; that is, it is the largest eigenvalue of the sum rather than the sum of the largest eigenvalues. It is never larger than the Ahlswede–Winter value (by the norm triangle inequality), but can be much smaller. Therefore, the theorem above gives a tighter bound than the Ahlswede–Winter result.
The chief contribution of Template:Harv was the extension of the Laplace-transform method used to prove the scalar Chernoff bound (see Chernoff bound#Theorem for additive form (absolute error)) to the case of self-adjoint matrices. The procedure given in the derivation below. All of the recent works on this topic follow this same procedure, and the chief differences follow from subsequent steps. Ahlswede & Winter use the Golden–Thompson inequality to proceed, whereas Tropp Template:Harv uses Lieb's Theorem.
Suppose one wished to vary the length of the series (n) and the dimensions of the matrices (d) while keeping the right-hand side approximately constant. Then n must vary approximately as the log of d. Several papers have attempted to establish a bound without a dependence on dimensions. Rudelson and Vershynin Template:Harv give a result for matrices which are the outer product of two vectors. Template:Harv provide a result without the dimensional dependence for low rank matrices. The original result was derived independently from the Ahlswede–Winter approach, but Template:Harv proves a similar result using the Ahlswede–Winter approach.
Finally, Oliveira Template:Harv proves a result for matrix martingales independently from the Ahlswede–Winter framework. Tropp Template:Harv slightly improves on the result using the Ahlswede–Winter framework. Neither result is presented in this article.
Derivation and proof
Ahlswede and Winter
The Laplace transform argument found in Template:Harv is a significant result in its own right: Let be a random self-adjoint matrix. Then
The second-to-last inequality is Markov's inequality. The last inequality holds since . Since the left-most quantity is independent of , the infimum over remains an upper bound for it.
Thus, our task is to understand Nevertheless, since trace and expectation are both linear, we can commute them, so it is sufficient to consider , which we call the matrix generating function. This is where the methods of Template:Harv and Template:Harv diverge. The immediately following presentation follows Template:Harv.
The Golden–Thompson inequality implies that
Suppose . We can find an upper bound for by iterating this result. Noting that , then
Iterating this, we get
So far we have found a bound with an infimum over . In turn, this can be bounded. At any rate, one can see how the Ahlswede–Winter bound arises as the sum of largest eigenvalues.
Tropp
The major contribution of Template:Harv is the application of Lieb's theorem where Template:Harv had applied the Golden–Thompson inequality. Tropp's corollary is the following: If is a fixed self-adjoint matrix and is a random self-adjoint matrix, then
Proof: Let . Then Lieb's theorem tells us that
is concave. The final step is to use Jensen's inequality to move the expectation inside the function:
This gives us the major result of the paper: the subadditivity of the log of the matrix generating function.
Subadditivity of log mgf
Let be a finite sequence of independent, random self-adjoint matrices. Then for all ,
Proof: It is sufficient to let . Expanding the definitions, we need to show that
To complete the proof, we use the law of total expectation. Let be the expectation conditioned on . Since we assume all the are independent,
Finally, we have
where at every step m we use Tropp's corollary with
Master tail bound
The following is immediate from the previous result:
All of the theorems given above are derived from this bound; the theorems consist in various ways to bound the infimum. These steps are significantly simpler than the proofs given.
References
- One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting
In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang
Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules
Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.
A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running
The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more
There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang - Template:Cite arxiv
- Template:Cite arxiv
- Template:Cite arxiv
- One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting
In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang
Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules
Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.
A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running
The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more
There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang - Template:Cite arxiv
- Template:Cite arxiv