|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| {{multiple issues|
| | Eusebio is the name girls use to call us and I think it sounds quite good when you say it. Idaho is our birth install. I obtained to be unemployed but now I am a cashier. My excellent say it's not [http://www.tumblr.com/tagged/incredibly incredibly] good for me but what I love doing is definitely to bake but I'll be thinking on starting new stuff. I'm not good at webdesign but you might wish for to check my website: http://prometeu.net<br><br> |
| {{Refimprove|date=October 2011}}
| |
| {{Expert-subject|Statistics|date=November 2008}}
| |
| }}
| |
|
| |
|
| In [[signal processing]], '''independent component analysis''' ('''ICA''') is a computational method for separating a [[multivariate statistics|multivariate]] signal into additive subcomponents by assuming that the subcomponents are non-Gaussian signals and that they are all [[Statistical independence|statistically independent]] from each other. ICA is a special case of [[blind source separation]].
| | my homepage clash of clans cheat - [http://prometeu.net visit the up coming internet site], |
| | |
| == Introduction ==
| |
| | |
| When the statistical independence assumption is correct, blind ICA separation of a mixed signal gives very good results.{{citation needed|date=May 2013}} It is also used for signals that are not supposed to be generated by a mixing for analysis purposes. A simple application of ICA is the "[[cocktail party problem]]", where the underlying speech signals are separated from a sample data consisting of people talking simultaneously in a room. Usually the problem is simplified by assuming no time delays or echoes. An important note to consider is that if ''N'' sources are present, at least ''N'' observations (e.g. microphones) are needed to recover the original signals. This constitutes the square case (''J'' = ''D'', where ''D'' is the input dimension of the data and ''J'' is the dimension of the model). Other cases of underdetermined (''J'' > ''D'') and overdetermined (''J'' < ''D'') have been investigated.
| |
| | |
| That the ICA separation of mixed signals gives very good results are based on two assumptions and three effects of mixing source signals.
| |
| Two assumptions:
| |
| #the source signals are independent of each other.
| |
| #the distribution of the values in each source signals are non-gaussian.
| |
| Three effects of mixing source signals:
| |
| #Independence: As what we assume, the source signals are independent, however, their signal mixtures are not. That is because the signal mixtures share the same source signals.
| |
| #Normality: Based on the [[Central Limit Theorem]], the distribution of a sum of independent random variables tends towards a gaussian distribution. Loosely speaking, a sum of two independent random variables usually has a distribution that is closer to gaussian than any of the two original variables. Here we consider the value of each signals as the random variable.
| |
| #Complexity: The temporal complexity of any signal mixture is greater than that of its simplest constituent source signal.
| |
| Those principles contributes to the basic establishment of ICA. If the signals we happen to extract from a set of mixtures are independent like sources signals, or have non-gaussian histograms like source signals, or have low complexity like source signals, then they must be source signals.<ref name="Stone 2004">{{cite book|last=Stone|first=James V.|title=Independent component analysis : a tutorial introduction|year=2004|publisher=MIT Press|location=Cambridge, Mass. [u.a.]|isbn=0-262-69315-1}}</ref><ref>{{cite book|last=Aapo Hyvarinen|first=Juha Karhunen, Erkki,Oja|title=Independent component analysis|year=2001|publisher=J. Wiley|location=New York|isbn=0-471-22131-7|edition=1st ed}}</ref>
| |
| | |
| == Defining component independence ==
| |
| ICA finds the independent components (also called factors, latent variables or sources) by maximizing the statistical independence of the estimated components. We may choose one of many ways to define independence, and this choice governs the form of the ICA algorithm. The two broadest definitions of independence for ICA are
| |
| | |
| # Minimization of mutual information
| |
| # Maximization of non-Gaussianity
| |
| | |
| The Minimization-of-[[Mutual information]] (MMI) family of ICA algorithms uses measures like [[Kullback–Leibler divergence|Kullback-Leibler Divergence]] and [[Principle of maximum entropy|maximum entropy]]. The non-Gaussianity family of ICA algorithms, motivated by the [[central limit theorem]], uses [[kurtosis]] and [[negentropy]].
| |
| | |
| Typical algorithms for ICA use centering (subtract the mean to create a zero mean signal), [[Whitening transformation|whitening]] (usually with the [[eigenvalue decomposition]]), and [[dimensionality reduction]] as preprocessing steps in order to simplify and reduce the complexity of the problem for the actual iterative algorithm. Whitening and [[dimension reduction]] can be achieved with [[principal component analysis]] or [[singular value decomposition]]. Whitening ensures that all dimensions are treated equally ''a priori'' before the algorithm is run. Well-known algorithms for ICA include [[infomax]], [[FastICA]], and JADE, but there are many others.
| |
| | |
| In general, ICA cannot identify the actual number of source signals, a uniquely correct ordering of the source signals, nor the proper scaling (including sign) of the source signals.
| |
| | |
| ICA is important to [[blind signal separation]] and has many practical applications. It is closely related to (or even a special case of) the search for a [[factorial code]] of the data, i.e., a new vector-valued representation of each data vector such that it gets uniquely encoded by the resulting code vector (loss-free coding), but the code components are statistically independent.
| |
| | |
| == Mathematical definitions ==
| |
| | |
| Linear independent component analysis can be divided into noiseless and noisy cases,
| |
| where noiseless ICA is a special case of noisy ICA. Nonlinear ICA should be considered as a separate case.
| |
| | |
| === General definition ===
| |
| | |
| The data is represented by the [[random vector]] <math>x=(x_1,\ldots,x_m)^T</math> and the
| |
| components as the random vector <math>s=(s_1,\ldots,s_n)^T.</math> The task is to transform the observed data <math>x,</math> using a linear static transformation ''W'' as <math>s = W x\,</math> into maximally independent components <math>s</math> measured by some function <math>F(s_1,\ldots,s_n)</math> of independence.
| |
| | |
| === Generative model ===
| |
| | |
| ==== Linear noiseless ICA ====
| |
| | |
| The components <math>x_i</math> of the observed random vector <math>x=(x_1,\ldots,x_m)^T</math> are generated as a sum of the independent components <math>s_k</math>, <math>k=1,\ldots,n</math>:
| |
| | |
| <math>x_i = a_{i,1} s_1 + \cdots + a_{i,k} s_k + \cdots + a_{i,n} s_n</math>
| |
| | |
| weighted by the mixing weights <math>a_{i,k}</math>.
| |
| | |
| The same generative model can be written in vectorial form as <math>x=\sum_{k=1}^{n} s_k a_k</math>, where the observed random vector <math>x</math> is represented by the basis vectors <math>a_k=(a_{1,k},\ldots,a_{m,k})^T</math>. The basis vectors <math>a_k</math> form the columns of the mixing matrix <math>A=(a_1,\ldots,a_n)</math> and the generative formula can be written as <math>x=As</math>, where <math>s=(s_1,\ldots,s_n)^T</math>.
| |
| | |
| Given the model and realizations (samples) <math>x_1,\ldots,x_N</math> of the random vector <math>x</math>, the task is to estimate both the mixing matrix <math>A</math> and the sources <math>s</math>. This is done by adaptively calculating the <math>w</math> vectors and setting up a cost function which either maximizes the nongaussianity of the calculated <math>s_k = (w^T*x)</math> or minimizes the mutual information. In some cases, a priori knowledge of the probability distributions of the sources can be used in the cost function.
| |
| | |
| The original sources <math>s</math> can be recovered by multiplying the observed signals <math>x</math> with the inverse of the mixing matrix <math>W=A^{-1}</math>, also known as the unmixing matrix. Here it is assumed that the mixing matrix is square (<math>n=m</math>). If the number of basis vectors is greater than the dimensionality of the observed vectors, <math>n>m</math>, the task is overcomplete but is still solvable with the [[pseudo inverse]].
| |
| | |
| ==== Linear noisy ICA ====
| |
| | |
| With the added assumption of zero-mean and uncorrelated Gaussian noise <math>n\sim N(0,\operatorname{diag}(\Sigma))</math>, the ICA model takes the form <math>x=As+n</math>.
| |
| | |
| ==== Nonlinear ICA ====
| |
| | |
| The mixing of the sources does not need to be linear. Using a nonlinear mixing function <math>f(\cdot|\theta)</math> with parameters <math>\theta</math> the [[nonlinear ICA]] model is <math>x=f(s|\theta)+n</math>.
| |
| | |
| === Identifiability ===
| |
| | |
| The independent components are identifiable up to a permutation and scaling of the sources. This identifiability requires that:
| |
| | |
| * At most one of the sources <math>s_k</math> is Gaussian,
| |
| * The number of observed mixtures, <math>m</math>, must be at least as large as the number of estimated components <math>n</math>: <math>m \ge n</math>. It is equivalent to say that the mixing matrix <math>A</math> must be of full [[rank (linear algebra)|rank]] for its inverse to exist.
| |
| | |
| == Binary independent component analysis ==
| |
| | |
| A special variant of ICA is '''Binary ICA''' in which both signal sources and monitors are in binary form and observations from monitors are disjunctive mixtures of binary independent sources. The problem was shown to have applications in many domains including [[medical diagnosis]], [[multi-cluster assignment]], [[network tomography]] and [[internet resource management]].
| |
| | |
| Let <math>{x_1, x_2, \ldots, x_m}</math> be the set of binary variables from <math>m</math> monitors and <math>{y_1, y_2, \ldots, y_n}</math> be the set of binary variables from <math>n</math> sources. Source-monitor connections are represented by the (unknown) mixing matrix <math>G</math>, where <math>g_{ij} = 1</math> indicates that signal from the ''i''-th source can be observed by the ''j''-th monitor. The system works as follows: at any time, if a source <math>i</math> is active (<math>y_i=1</math>) and it is connected to the monitor <math>j</math> (<math>g_{ij}=1</math>) then the monitor <math>j</math> will observe some activity (<math>x_j=1</math>). Formally we have:
| |
| | |
| :<math>
| |
| x_i = \bigvee_{j=1}^n (g_{ij}\wedge y_j), i = 1, 2, \ldots, m,
| |
| </math>
| |
| | |
| where <math>\wedge</math> is Boolean AND and <math>\vee</math> is Boolean OR. Note that noise is not explicitly modeled, rather, can be treated as independent sources.
| |
| | |
| The above problem can be heuristically solved <ref name="Hyvärinen">Johan Himbergand Aapo Hyvärinen, ''[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8895 Independent Component Analysis For Binary Data: An Experimental Study]'', Proc. Int. Workshop on Independent Component Analysis and Blind Signal Separation (ICA2001), San Diego, California, 2001.</ref> by assuming variables are continuous and running [[FastICA]] on binary observation data to get the mixing matrix <math>G</math> (real values), then apply [[round number]] techniques on <math>G</math> to obtain the binary values. This approach has been shown to produce a highly inaccurate result.{{citation needed|date=October 2012}}
| |
| | |
| Another method is to use dynamic programming: recursively breaking the observation matrix <math>X</math> into its sub-matrices and run the inference algorithm on these sub-matrices. The key observation which leads to this algorithm is the sub-matrix <math>X^0</math> of <math>X</math> where <math>x_{ij} = 0, \forall j</math> corresponds to the unbiased observation matrix of hidden components that do not have connection to the <math>i</math>-th monitor. Experimental results from <ref name="Huyna">Huy Nguyen and Rong Zheng, ''[http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5753957 Binary Independent Component Analysis With or Mixtures]'', IEEE Transactions on Signal Processing, Vol. 59, Issue 7. (July 2011), pp. 3168–3181.</ref> show that this approach is accurate under moderate noise levels.
| |
| | |
| == Methods for Blind Source Separation <ref name="James V. Stone 2004">James V. Stone(2004); "Independent Component Analysis: A Tutorial Introduction", The MIT Press Cambridge, Massachusetts, London, England; ISBN 0-262-69315-1</ref> ==
| |
| | |
| === Projection Pursuit <ref name="James V. Stone 2004"/> ===
| |
| | |
| We know that signal mixtures tend to have Gaussian probability density functions, and that source signals have non-Gaussian probability density functions. We also know that each source signal can be extracted from a set of signal mixtures by taking the inner product of a weight vector and those signal mixtures where this inner product provides an orthogonal projection of the signal mixtures. But we do not yet know precisely how to find such a weight vector. One type of method for doing so is [[projection pursuit]].<ref>Kruskal, JB. 1969; "Toward a practical method which helps uncover the structure of a set of observations by finding the line transformation which optimizes a new “index of condensation”", Pages 427–440 of: Milton, RC, & Nelder, JA (eds), Statistical computation; New York, Academic Press</ref>
| |
| | |
| Projection pursuit seek one projection at a time such that the extracted signal is as non-Gaussian as possible. This contrasts with ICA, which typically extracts ''M'' signals simultaneously from ''M'' signal mixtures, which requires estimating a ''M'' × ''M'' unmixing matrix. One practical advantage of projection pursuit over ICA is that less than ''M'' signals can be extracted if required, where each source signal is extracted from ''M'' signal mixtures using an ''M''-element weight vector.
| |
| | |
| We can use [[kurtosis]] to recover the multiple source signal by finding the correct weight vectors with the use of projection pursuit.
| |
| | |
| The [[kurtosis]] of the probability density function of a signal, for a finite sample, is computed as
| |
| | |
| :<math>
| |
| K=\frac{\operatorname{E}[(\mathbf{y}-\mathbf{\overline{y}})^4]}{(\operatorname{E}[(\mathbf{y}-\mathbf{\overline{y}})^2])^2}-3
| |
| </math>
| |
| | |
| where <math>\mathbf{\overline{y}}</math> is the [[sample mean]] of <math>\mathbf{y}</math>, the extracted signals. The constant 3 ensures that Gaussian signals have zero kurtosis, Super-Gaussian signals have positive kurtosis, and Sub-Gaussian signals have negative kurtosis. The denominator is the [[variance]] of <math>\mathbf{y}</math>, and ensures that the measured kurtosis takes account of signal variance. The goal of projection pursuit is to maximize the kurtosis, and make the extracted signal as non-normality as possible.
| |
| | |
| Using kurtosis as a measure of non-normality, we can now examine how the kurtosis of a signal <math>\mathbf{y} = \mathbf{w}^T \mathbf{x}</math> extracted from a set of ''M'' mixtures <math>\mathbf{x}=(x_1,x_2,\ldots,x_M)^T</math> varies as the weight vector <math>\mathbf{w}</math> is rotated around the origin. Given our assumption that each source signal <math>\mathbf{s}</math> is super-gaussian we would expect:
| |
| #the kurtosis of the extracted signal <math>\mathbf{y}</math> to be maximal precisely when <math>\mathbf{y} = \mathbf{s}</math>.
| |
| #the kurtosis of the extracted signal <math>\mathbf{y}</math> to be maximal when <math>\mathbf{w}</math> is orthogonal to the projected axes <math>S_1</math> or <math>S_2</math>, because we know the optimal weight vector should be orthogonal to a transformed axis <math>S_1</math> or <math>S_2</math>.
| |
| | |
| For multiple source mixture signals, we can use kurtosis and [[Gram-Schmidt]] Orthogonalizaton (GSO) to recover the signals. Given ''M'' signal mixtures in an ''M''-dimensional space, GSO project these data points onto an (''M-1'')-dimensional space by using the weight vector. We can guarantee the independence of the extracted signals with the use of GSO.
| |
| | |
| In order to find the correct value of <math>\mathbf{w}</math>, we can use [[gradient descent]] method. We first of all whitening the data, and transform <math>\mathbf{x}</math> into a new mixture <math>\mathbf{z}</math>, which has unit variance, and <math>\mathbf{z}=(z_1,z_2,\ldots,z_M)^T</math>. This process can be achieved by applying [[Singular value decomposition]] to <math>\mathbf{x}</math>,
| |
| | |
| : <math>\mathbf{x} = \mathbf{U} \mathbf{D} \mathbf{V}^T</math>
| |
| | |
| Rescaling each vector <math>U_i=U_i/\operatorname{E}(U_i^2)</math>, and let <math>\mathbf{z} = \mathbf{U}</math>. The signal extracted by a weighted vector <math>\mathbf{w}</math> is <math>\mathbf{y} = \mathbf{w}^T \mathbf{z}</math>. If the weight vector '''w''' has unit length, that is <math>\operatorname{E}[(\mathbf{w}^T \mathbf{z})^2]=1</math>, then the kurtosis can be written as:
| |
| | |
| :<math>
| |
| K=\frac{\operatorname{E}[\mathbf{y}^4]}{(\operatorname{E}[\mathbf{y}^2])^2}-3=\operatorname{E}[(\mathbf{w}^T \mathbf{z})^4]-3.
| |
| </math>
| |
| | |
| The updating process for <math>\mathbf{w}</math> is:
| |
| :<math>\mathbf{w}_{new}=\mathbf{w}_{old}-\eta\operatorname{E}[\mathbf{z}(\mathbf{w}_{old}^T \mathbf{z})^3 ].</math>
| |
| where <math>\eta</math> is a small constant to guarantee that <math>\mathbf{w}</math> converge to the optimal solution. After each update, we normalized <math>\mathbf{w}_{new}=\frac{\mathbf{w}_{new}}{|\mathbf{w}_{new}|}</math>, and set <math>\mathbf{w}_{old}=\mathbf{w}_{new}</math>, and repeat the updating process till it is converge. We can also use another algorithm to update the weight vector <math>\mathbf{w}</math>.
| |
| | |
| Another approach is using [[Negentropy]]<ref>{{cite journal|last=Hyvärinen|first=Aapo|coauthors=Erkki Oja|title=Independent Component Analysis:Algorithms and Applications|journal=Neural Networks|year=2000|volume=13|series=4-5|pages=411–430}}</ref> instead of kurtosis. Negentropy is a robust method for kurtosis, as kurtosis is very sensitive to outliers.
| |
| The negentropy method are based on an important property of gaussian distribution : a gaussian variable has the largest entropy among all random variables of equal variance. This is also the reason why we want to find the most nongaussian variables. A simple prove can be found in wiki page [[Differential entropy]].
| |
| :<math>J(x) = S(y) - S(x)\,</math>
| |
| | |
| y is a Gaussian random variable of the same covariance matrix as x
| |
| | |
| :<math>S(x) = - \int p_x(u) \log p_x(u) du</math>
| |
| | |
| A approximation for negentropy is
| |
| :<math>J(x)=\frac{1}{12}(E(x^3))^2 + \frac{1}{48}(kurt(x))^2</math>
| |
| A prove can be found one the page 131 in the book Independent Components Analysis written by Aapo Hyv¨arinen, Juha Karhunen, and Erkki Oja (They contributes great works to ICA)<ref>{{cite book|first=Aapo Hyvärinen ; Juha Karhunen ; Erkki Oja|title=Independent component analysis|year=2001|publisher=Wiley|location=New York, NY [u.a.]|isbn=0-471-40540-X|edition=[Reprinted].}}</ref>
| |
| This approximation also suffer the same problem as kurtosis (sensitive to outliers) Another approach were developed.<ref>{{cite journal|last=Hyvärinen|first=Aapo|title=New approximations of differential entropy for independent component analysis and projection pursuit.|journal=Advances in Neural Information Processing Systems|year=1998|volume=10|pages=273–279}}</ref>
| |
| :<math>J(y) = k_1(E(G_1(y)))^2 + k_2(E(G_2(y)) - E(G_2(v))^2</math>
| |
| A choice of <math>G_1</math> and <math>G_2</math> are
| |
| :<math>G_1 = \frac{1}{a1}log(cosh((a1)u))</math> and <math>G_2 = -exp(-\frac{u^2}{2})</math>
| |
| | |
| === Independent Component Analysis based on Infomax <ref name="ReferenceA">James V. Stone(2004); "Independent Component Analysis: A Tutorial Introduction", The MIT Press
| |
| Cambridge, Massachusetts, London, England; ISBN 0-262-69315-1</ref> ===
| |
| | |
| ICA is essentially a multivariate, parallel version of projection pursuit. Whereas projection pursuit extracts a series of signals one at a time from a set of ''M'' signal mixtures, ICA extracts ''M'' signals in parallel. This tends to make ICA more robust than projection pursuit.
| |
| | |
| The projection pursuit method use [[Gram-Schmidt]] Orthogonalizaton to ensure the independence of the extracted signal, while ICA use [[infomax]] and [[maximum likelihood]] estimate to ensure the independence of the extracted signal. The Non-Normality of the extracted signal is achieved by assigning an appropriate model, or prior, for the signal.
| |
| | |
| The process of ICA based on [[infomax]] in short is: given a set of signal mixtures <math>\mathbf{x}</math> and a set of identical independent model [[cumulative distribution functions]](cdfs) <math>g</math>, we seek the unmixing matrix <math>\mathbf{W}</math> which maximizes the joint [[entropy]] of the signals <math>\mathbf{Y}=g(\mathbf{y})</math>, where <math>\mathbf{y}=\mathbf{Wx}</math> are the signals extracted by <math>\mathbf{W}</math>. Given the optimal <math>\mathbf{W}</math>, the signals <math>\mathbf{Y}</math> have maximum entropy and are therefore independent, which ensures that the extracted signals <math>\mathbf{y}=g^{-1}(\mathbf{Y})</math> are also independent. <math>g</math> is an invertible function, and is the signal model. Note that if the source signal model [[probability density function]] <math>p_s</math> matches the [[probability density function]] of the extracted signal <math>p_{\mathbf{y}}</math>, then maximizing the joint entropy of <math>Y</math> also maximizes the amount of [[mutual information]] between <math>\mathbf{x}</math> and <math>\mathbf{Y}</math>. For this reason, using entropy to extract independent signals is known as [[infomax]].
| |
| | |
| Consider the entropy of the vector variable <math>\mathbf{Y}=g(\mathbf{y})</math>, where <math>\mathbf{y}=\mathbf{Wx}</math> is the set of signals extracted by the unmixing matrix <math>\mathbf{W}</math>. For a finite set of values sampled from a distribution with pdf <math>p_{\mathbf{y}}</math>, the entropy of <math>\mathbf{Y}</math> can be estimated as:
| |
| :<math>
| |
| H(\mathbf{Y})=-\frac{1}{N}\sum_{t=1}^N \ln p_{\mathbf{Y}}(\mathbf{Y}^t)
| |
| </math>
| |
| The joint pdf <math>p_{\mathbf{Y}}</math> can be shown to be related to the joint pdf <math>p_{\mathbf{y}}</math> of the extracted signals by the multivariate form:
| |
| :<math>
| |
| p_{\mathbf{Y}}(Y)=\frac{p_{\mathbf{y}}(\mathbf{y})}{|\frac{\partial\mathbf{Y}}{\partial \mathbf{y}}|}
| |
| </math>
| |
| | |
| where <math>\mathbf{J}=\frac{\partial\mathbf{Y}}{\partial \mathbf{y}}</math> is the [[Jacobian matrix]]. We have <math>|\mathbf{J}|=g'(\mathbf{y})</math>, and <math>g'</math> is the pdf assumed for source signals <math>g'=p_s</math>, therefore,
| |
| :<math>
| |
| p_{\mathbf{Y}}(Y)=\frac{p_{\mathbf{y}}(\mathbf{y})}{|\frac{\partial\mathbf{Y}}{\partial \mathbf{y}}|}=\frac{p_\mathbf{y}(\mathbf{y})}{p_\mathbf{s}(\mathbf{y})}
| |
| </math>
| |
| therefore,
| |
| :<math>
| |
| H(\mathbf{Y})=-\frac{1}{N}\sum_{t=1}^N \ln\frac{p_\mathbf{y}(\mathbf{y})}{p_\mathbf{s}(\mathbf{y})}
| |
| </math>
| |
| | |
| We know that when <math>p_{\mathbf{y}}=p_s</math>, <math>p_{\mathbf{Y}}</math> is of uniform distribution, and <math>H({\mathbf{Y}})</math> is maximized.
| |
| Since
| |
| :<math>
| |
| p_{\mathbf{y}}(\mathbf{y})=\frac{p_\mathbf{x}(\mathbf{x})}{|\frac{\partial\mathbf{y}}{\partial\mathbf{x}}|}=\frac{p_\mathbf{x}(\mathbf{x})}{|\mathbf{W}|}
| |
| </math>
| |
| where <math>|\mathbf{W}|</math> is the absolute value of the determinant of the unmixing matix < math>\mathbf{W}</math>.
| |
| Therefore,
| |
| :<math>
| |
| H(\mathbf{Y})=-\frac{1}{N}\sum_{t=1}^N \ln\frac{p_\mathbf{x}(\mathbf{x}^t)}{|\mathbf{W}|p_\mathbf{s}(\mathbf{y}^t)}
| |
| </math>
| |
| so,
| |
| :<math>
| |
| H(\mathbf{Y})=-\frac{1}{N}\sum_{t=1}^N \ln p_\mathbf{s}(\mathbf{y}^t)+\ln|\mathbf{W}|-H(\mathbf{x})
| |
| </math>
| |
| since <math>H(\mathbf{x})=-\frac{1}{N}\sum_{t=1}^N\ln p_\mathbf{x}(\mathbf{x}^t)</math>, and maximizing <math>\mathbf{W}</math> does not affect <math>H_{\mathbf{x}}</math>, so we can maximized function
| |
| :<math>
| |
| h(\mathbf{Y})=-\frac{1}{N}\sum_{t=1}^N \ln p_\mathbf{s}(\mathbf{y}^t)+\ln|\mathbf{W}|
| |
| </math>
| |
| to achieve the independence of extracted signal.
| |
| | |
| If there are ''M'' marginal pdfs of the model joint pdf <math>p_{\mathbf{s}}</math> are independent and use the commonly super-gaussian model pdf for the source signals <math>p_{\mathbf{s}}=(1-\tanh(\mathbf{s})^2)</math>, then we have
| |
| :<math>
| |
| h(\mathbf{Y})=\frac{1}{N}\sum_{i=1}^M\sum_{t=1}^N \ln (1-\tanh(\mathbf{w_i^T x^t})^2)+\ln|\mathbf{W}|
| |
| </math>
| |
| | |
| In sum, give an observed signal mixture <math>\mathbf{x}</math>, corresponding set of extracted signals <math>\mathbf{y}</math> and source signal model <math>p_{\mathbf{s}}=g'</math>, we can find the optimal unmixing matrix <math>\mathbf{W}</math>, and make the extracted signals independent and non-gaussian. Like the projection pursuit situation, we can use gradient descent method to find the optimal solution of the unmixing matrix.
| |
| | |
| === Independent Component Analysis based on [[Maximum Likelihood]] Estimation <ref name="ReferenceA"/> ===
| |
| | |
| '''[[Maximum likelihood]] estimation (MLE)''' is a standard statistical tool for finding parameter values (e.g. the unmixing matrix <math>\mathbf{W}</math>) that provide the best fit of some data (e.g., the extracted signals <math>y</math>) to a given a model (e.g., the assumed joint probability density function (pdf) <math>p_s</math> of source signals).
| |
| | |
| The '''ML''' “model” includes a specification of a pdf, which in this case is the pdf <math>p_s</math> of the unknown source signals <math>s</math>. Using '''ML ICA''', the objective is to find an unmixing matrix that yields extracted signals <math>y = \mathbf{W}x</math> with a joint pdf as similar as possible to the joint pdf
| |
| <math>p_s</math> of the unknown source signals <math>s</math>.
| |
| | |
| '''MLE''' is thus based on the assumption that if the model pdf <math>p_s</math> and the model parameters <math>\mathbf{A}</math> are correct then a high probability should be obtained for the data <math>x</math> that were actually observed. Conversely, if <math>\mathbf{A}</math> is far from the correct parameter values then a low probability of the observed data would be expected.
| |
| | |
| Using '''MLE''', we call the probability of the observed data for a given set of model parameter values (e.g., a pdf <math>p_s</math> and a matrix <math>\mathbf{A}</math>) the ''likelihood'' of the model parameter values given the observed data.
| |
| | |
| We define a ''likelihood'' function <math>\mathbf{L(W)}</math> of <math>\mathbf{W}</math>:
| |
| | |
| <math>\mathbf{ L(W)} = p_s (\mathbf{W}x)|\mathbf{W}|. </math>
| |
| | |
| Thus, if we wish to find a <math>\mathbf{W}</math> that is most likely to have generated the observed mixtures <math>x</math> from the unknown source signals <math>s</math> with pdf <math>p_s</math> then we need only find that <math>\mathbf{W}</math> which maximizes the ''likelihood'' <math>\mathbf{L(W)}</math>. The unmixing matrix that maximizes equation is known as the '''MLE''' of the optimal unmixing matrix.
| |
| | |
| It is common practice to use the log ''likelihood'', because this is easier to evaluate. As the logarithm is a monotonic function, the <math>\mathbf{W}</math> that maximizes the function <math>\mathbf{L(W)}</math> also maximizes its logarithm <math>ln \mathbf{L(W)}</math>. This allows us to take the logarithm of equation above, which yields the log ''likelihood'' function
| |
| | |
| <math>ln \mathbf{L(W)} =\sum_{i}\sum_{t} ln p_s(w^T_ix_t) + Nln|\mathbf{W}|</math>
| |
| | |
| If we substitute a commonly used high-[[Kurtosis]] model pdf for the source signals <math>p_s = (1-tanh(s)^2)</math> then we have
| |
| | |
| <math>ln \mathbf{L(W)} ={1 \over N}\sum_{i}^{M} \sum_{t}^{N}ln(1-tanh(w^T_i x_t )^2) + ln |\mathbf{W}|</math>
| |
| | |
| This matrix <math>\mathbf{W}</math> that maximizes this function is the '''''[[maximum likelihood]] estimation'''''.
| |
| | |
| == History and background ==
| |
| | |
| The general framework for independent component analysis was introduced by Jeanny Herault and Christian Jutten in 1986 and was most clearly stated by Pierre Comon in 1994. In 1995, Tony Bell and [[Terry Sejnowski]] introduced a fast and efficient ICA algorithm based on infomax, a principle introduced by Ralph Linsker in 1992.
| |
| | |
| There are many algorithms available in the literature which do ICA. A largely used one, including in industrial applications, is the FastICA algorithm, developed by Aapo Hyvärinen and Erkki Oja, which uses the [[kurtosis]] as cost function. Other examples are rather related to [[blind source separation]] where a more general approach is used. For example, one can drop the independence assumption and separate mutually correlated signals, thus, statistically "dependent" signals. Sepp Hochreiter and [[Jürgen Schmidhuber]] showed how to obtain non-linear ICA or [[source separation]] as a by-product of [[regularization]] (1999). Their method does not require a priori knowledge about the number of independent sources..
| |
| | |
| == Applications ==
| |
| | |
| ICA can be extended to analyze non-physical signals. For instance, ICA has been applied to discover discussion topics on a bag of news list archives.....
| |
| -->
| |
| | |
| Some ICA applications are listed below:<ref name="Stone 2004"/>
| |
| * Optical Imaging of neurons<ref>{{cite journal|last=Brown|first=GD|coauthors=Yamada,S,& Sejnowski, TJ|title=Independent components analysis at the neural cocktail party|journal=Trends in neuroscience|year=2001|volume=24|pages=54–63}}</ref>
| |
| * Neuronal spike sorting<ref>{{cite journal|last=Lewicki|first=MS|title=Areview of methods for spike sorting: detection and classification of neural action potentials|journal=Network: Computation in neural systems|year=1998|volume=9|pages=53–78}}</ref>
| |
| * Face recognition<ref>{{cite book|last=Barlett|first=MS|title=Face image analysis by unsupervised learning|year=2001|publisher=Kluwer International Series on Engineering and Computer Science|location=Boston}}</ref>
| |
| * Modeling receptive fields of primary visual neurons<ref>{{cite journal|last=Bell|first=AJ|coauthors=Sejnowski, TJ|title=The independent components of natural scences are edge filters|journal=vision research|year=1997|volume=37|pages=3327–3338}}</ref>
| |
| * Predicting stock market prices<ref>{{cite journal|last=Back|first=AD|coauthors=Weigend, AS|title=A first application of independent component analysis to extracting structure from stock returns|journal=International journal of neural systems|year=1997|volume=8|pages=473–484}}</ref>
| |
| * mobile phone communications <ref>{{cite book|last=Hyvarinen|first=A, Karhunen,J & Oja,E|title=Independent component analysis|year=2001a|publisher=John Wiley and Sons|location=New York}}</ref>
| |
| * color_based detection of the ripeness pf tomatoes<ref>{{cite journal|last=Polder|first=G|coauthors=van der Heijen, FWAM|title=Estimation of compound distribution in spectral images of tomatoes using independent component analysis|journal=Austrian Computer Society|year=2003|pages=57–64}}</ref>
| |
| | |
| == See also ==
| |
| | |
| * [[Blind deconvolution]]
| |
| * [[Factor analysis]]
| |
| * [[Hilbert spectrum]]
| |
| * [[Image processing]]
| |
| * [[Multilinear principal component analysis|Multilinear PCA]]
| |
| * [[Multilinear subspace learning]]
| |
| * [[Non-negative matrix factorization|Non-negative matrix factorization (NMF)]]
| |
| * [[Nonlinear dimensionality reduction]]
| |
| * [[Projection pursuit]]
| |
| * [[Varimax rotation]]
| |
| | |
| ==Notes==
| |
| {{Reflist}}
| |
| | |
| ==References==
| |
| | |
| * Comon, Pierre (1994): [http://mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA.pdf "Independent Component Analysis: a new concept?"], ''Signal Processing'', 36(3):287–314 (The original paper describing the concept of ICA)
| |
| * Hyvärinen, A.; Karhunen, J.; Oja, E. (2001): ''[http://www.cis.hut.fi/projects/ica/book/ Independent Component Analysis]'', New York: Wiley, ISBN 978-0-471-40540-5 ( [http://www.cis.hut.fi/projects/ica/book/intro.pdf Introductory chapter] )
| |
| * Hyvärinen, A.; Oja, E. (2000): [http://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf "Independent Component Analysis: Algorithms and Application"], ''Neural Networks'', 13(4-5):411-430. (Technical but pedagogical introduction).
| |
| * Comon, P.; Jutten C., (2010): Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press, Oxford UK. ISBN 978-0-12-374726-6
| |
| * Lee, T.-W. (1998): ''Independent component analysis: Theory and applications'', Boston, Mass: Kluwer Academic Publishers, ISBN 0-7923-8261-7
| |
| * Acharyya, Ranjan (2008): ''A New Approach for Blind Source Separation of Convolutive Sources - Wavelet Based Separation Using Shrinkage Function'' ISBN 3-639-07797-0 ISBN 978-3639077971 (this book focuses on unsupervised learning with Blind Source Separation)
| |
| | |
| == External links ==
| |
| * [http://www.cs.helsinki.fi/u/ahyvarin/whatisica.shtml What is independent component analysis?] by Aapo Hyvärinen
| |
| * [http://www.cis.hut.fi/aapo/papers/IJCNN99_tutorialweb/IJCNN99_tutorial3.html Independent Component Analysis: A Tutorial] by Aapo Hyvärinen
| |
| * [http://www.cis.hut.fi/projects/ica/fastica/ FastICA as a package for Matlab, in R language, C++]
| |
| * [http://www.bsp.brain.riken.go.jp/ICALAB/ ICALAB Toolboxes] for Matlab, developed at [[RIKEN]]
| |
| * [http://nic.uoregon.edu/projects/hipersat/index.php High Performance Signal Analysis Toolkit] provides C++ implementations of FastICA and Infomax
| |
| * [http://isp.imm.dtu.dk/toolbox/ ICA toolbox] Matlab tools for ICA with Bell-Sejnowski, Molgedey-Schuster and mean field ICA. Developed at DTU.
| |
| * [http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi Demonstration of the cocktail party problem]
| |
| * [http://sccn.ucsd.edu/eeglab/ EEGLAB Toolbox] ICA of [[electroencephalogram|EEG]] for Matlab, developed at UCSD.
| |
| * [http://sccn.ucsd.edu/fmrlab/ FMRLAB Toolbox] ICA of [[fMRI]] for Matlab, developed at UCSD
| |
| * [http://brandon-merkl.blogspot.com/2005/12/independent-component-analysis.html Discussion of ICA used in a biomedical shape-representation context]
| |
| * [http://mdp-toolkit.sourceforge.net/ FastICA, CuBICA, JADE and TDSEP algorithm for Python and more...]
| |
| * [http://icatb.sourceforge.net/ Group ICA Toolbox and Fusion ICA Toolbox]
| |
| * [http://www.nbtwiki.net/doku.php?id=tutorial:compute_independent_component_analysis Tutorial: Using ICA for cleaning EEG signals]
| |
| | |
| [[Category:Signal processing]]
| |
| [[Category:Data analysis]]
| |
| [[Category:Time series analysis]]
| |
| [[Category:Statistical models]]
| |
| [[Category:Multivariate statistics]]
| |