Digital Signal 1: Difference between revisions
en>AnomieBOT m Dating maintenance tags: {{Dead link}} |
en>Materialscientist m Reverted 1 edit by 212.40.141.1 identified as test/vandalism using STiki |
||
Line 1: | Line 1: | ||
[[Image:Entropy-mutual-information-relative-entropy-relation-diagram.svg|thumb|256px|right|Individual (H(X),H(Y)), joint (H(X,Y)), and conditional entropies for a pair of correlated subsystems X,Y with mutual information I(X; Y).]] | |||
In [[information theory]], the '''conditional entropy''' (or '''equivocation''') quantifies the amount of information needed to describe the outcome of a [[random variable]] <math>Y</math> given that the value of another random variable <math>X</math> is known. | |||
Here, information is measured in [[bit]]s, [[nat (information)|nat]]s, or [[ban (information)|ban]]s. | |||
The ''entropy of <math>Y</math> conditioned on <math>X</math>'' is written as <math>H(Y|X)</math>. | |||
== | == Definition == | ||
If <math>H(Y|X=x)</math> is the entropy of the variable <math>Y</math> conditioned on the variable <math>X</math> taking a certain value <math>x</math>, then <math>H(Y|X)</math> is the result of averaging <math>H(Y|X=x)</math> over all possible values <math>x</math> that <math>X</math> may take. | |||
Given discrete random variable <math>X</math> with [[support (mathematics)|support]] <math>\mathcal X</math> and <math>Y</math> with support <math>\mathcal Y</math>, the conditional entropy of <math>Y</math> given <math>X</math> is defined as:<ref>{{cite book|last=Thomas|first=Thomas M. Cover, Joy A.|title=Elements of information theory|year=1991|publisher=Wiley|location=New York|isbn=0-471-06259-6|edition=99th ed.}}</ref> | |||
::<math>\begin{align} | |||
H(Y|X)\ &\equiv \sum_{x\in\mathcal X}\,p(x)\,H(Y|X=x)\\ | |||
&{=}\sum_{x\in\mathcal X} \left(p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\, \frac{1}{p(y|x)}\right)\\ | |||
&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(x,y)\,\log\,p(y|x)\\ | |||
&=-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(y|x)\\ | |||
&=\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}. \\ | |||
\end{align}</math> | |||
<!-- This paragraph is incorrect; the last line is not the KL divergence between any two distributions, since p(x) is [in general] not a valid distribution over the domains of X and Y. The last formula above is the [[Kullback-Leibler divergence]], also known as relative entropy. Relative entropy is always positive, and vanishes if and only if <math>p(x,y) = p(x)</math>. This is when knowing <math>x</math> tells us everything about <math>y</math>. --> | |||
''Note:'' The supports of ''X'' and ''Y'' can be replaced by their [[domain of a function|domains]] if it is understood that <math> 0 \log 0 </math> should be treated as being equal to zero.<!-- and besides, p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0 --> | |||
<math>H(Y|X)=0</math> if and only if the value of <math>Y</math> is completely determined by the value of <math>X</math>. Conversely, <math>H(Y|X) = H(Y)</math> if and only if <math>Y</math> and <math>X</math> are [[independent random variables]]. | |||
==Chain rule== | |||
Assume that the combined system determined by two random variables ''X'' and ''Y'' has entropy <math>H(X,Y)</math>, that is, we need <math>H(X,Y)</math> bits of information to describe its exact state. | |||
Now if we first learn the value of <math>X</math>, we have gained <math>H(X)</math> bits of information. | |||
Once <math>X</math> is known, we only need <math>H(X,Y)-H(X)</math> bits to describe the state of the whole system. | |||
This quantity is exactly <math>H(Y|X)</math>, which gives the ''chain rule'' of conditional entropy: | |||
:<math>H(Y|X)\,=\,H(X,Y)-H(X) \, .</math> | |||
Formally, the chain rule indeed follows from the above definition of conditional entropy: | |||
:<math>\begin{align} | |||
H(Y|X)=&\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}\\ | |||
=&-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x,y) + \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x) \\ | |||
=& H(X,Y) + \sum_{x \in \mathcal X} p(x)\log\,p(x) \\ | |||
=& H(X,Y) - H(X). | |||
\end{align}</math> | |||
==Generalization to quantum theory== | |||
In [[quantum information theory]], the conditional entropy is generalized to the [[conditional quantum entropy]]. The latter can take negative values, unlike its classical counterpart. | |||
==Other properties== | |||
For any <math>X</math> and <math>Y</math>: | |||
: <math>H(X|Y) \le H(X) \, </math> | |||
<math>H(X,Y) = H(X|Y) + H(Y|X) + I(X;Y)</math>, where <math>I(X;Y)</math> is the [[mutual information]] between <math>X</math> and <math>Y</math>. | |||
: <math>I(X;Y) \le H(X),\,</math> | |||
where <math>I(X;Y)</math> is the mutual information between <math>X</math> and <math>Y</math>. | |||
For independent <math>X</math> and <math>Y</math>: | |||
: <math>H(Y|X) = H(Y)\text{ and }H(X|Y) = H(X) \, </math> | |||
Although the specific-conditional entropy, <math>H(X|Y=y)</math>, can be either less or greater than <math>H(X|Y)</math>, <math>H(X|Y=y)</math> can never exceed <math>H(X)</math> when <math>X</math> is the uniform distribution. | |||
==References== | |||
{{Reflist}} | |||
== See also == | |||
* [[Entropy (information theory)]] | |||
* [[Mutual information]] | |||
* [[Conditional quantum entropy]] | |||
* [[Variation of information]] | |||
* [[Entropy power inequality]] | |||
* [[Likelihood function]] | |||
[[Category:Entropy and information]] | |||
[[Category:Information theory]] |
Revision as of 13:38, 25 November 2013
In information theory, the conditional entropy (or equivocation) quantifies the amount of information needed to describe the outcome of a random variable given that the value of another random variable is known. Here, information is measured in bits, nats, or bans. The entropy of conditioned on is written as .
Definition
If is the entropy of the variable conditioned on the variable taking a certain value , then is the result of averaging over all possible values that may take.
Given discrete random variable with support and with support , the conditional entropy of given is defined as:[1]
Note: The supports of X and Y can be replaced by their domains if it is understood that should be treated as being equal to zero.
if and only if the value of is completely determined by the value of . Conversely, if and only if and are independent random variables.
Chain rule
Assume that the combined system determined by two random variables X and Y has entropy , that is, we need bits of information to describe its exact state. Now if we first learn the value of , we have gained bits of information. Once is known, we only need bits to describe the state of the whole system. This quantity is exactly , which gives the chain rule of conditional entropy:
Formally, the chain rule indeed follows from the above definition of conditional entropy:
Generalization to quantum theory
In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy. The latter can take negative values, unlike its classical counterpart.
Other properties
, where is the mutual information between and .
where is the mutual information between and .
Although the specific-conditional entropy, , can be either less or greater than , can never exceed when is the uniform distribution.
References
43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.
See also
- Entropy (information theory)
- Mutual information
- Conditional quantum entropy
- Variation of information
- Entropy power inequality
- Likelihood function
- ↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534