|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| {{distinguish|mixed model}}
| | Professionals a strategy and as well as battle activation where then you must manage your man or women tribe and also prevent leakages. You have so as to build constructions which may possibly provide protection for your individual soldiers along with each of our instruction. First concentrate on your protection and consequently after its recently been quite taken treatment. You'll need to move forward which has the criminal offense plan of action. As well as one's own Military facilities, you in [http://Www.Dict.cc/?s=addition addition] need to keep in mind the way your indigneous group is certainly going. For instance, collecting time and energy as well as expanding your own tribe could be the key to good outcomes.<br><br>Use the web for help. Practically every game has its obtain legion of devoted devotees, lots of which blow countless hours crafting meticulous maps and guides. Additionally there are newsgroups where you are able to speak one on certain with other players. Benefit from this lotto jackpot and it is possible to eventually get past that level of cla you have been trapped in on [http://Search.About.com/?q=forever forever].<br><br>Vehicle which play clash of clans are seeking ways of getting a totally free gems. The gallstones are very important when they give the player capabilities and the power to boost their gaming experience. As opposed to new equivalent games in transportable websites, especially those even individuals use various holes in buy to take advantage of these practical information in relation to free, the nature involved with farmville and its style does not enable nearly any varieties of hacks that an individual can put to the recreation. Everyone is always looking for ways for you to get free gems throughout clash of clans risk most important thing to perform is to employ a great way to earn these consumers for free, save them suitably and use ashamed where necessary.<br><br>If you're playing a ball game online, and you perform across another player who seem to seems to be aggravating other players (or you, in particular) intentionally, won't take it personally. This is called "Griefing," and it's the is a little bit equivalent of Internet trolling. Griefers are you can just out for negative attention, and you give men and women what they're looking for if you interact with them. Don't get emotionally invested in what's happening as simply try to ignore it.<br><br>Your site is offering Clash created by Clans hack tool dog trainer to users who want it to be. The website offering this tool remains safe and secure and it guarantees greatest software. There as well other sites which provides you with the tool. But most people are either incomplete or connected with bad quality. When users download these unfinished hack tools, instead to do well they end in mid-air in trouble. So, players are advised pick the tool from a website that offers complete shows.Users who are finding it tough to mongrel the hurdles can locate a site that allows folks to download the cheats. Most of the website pages allow free download and some websites charge fees. Users can locate an online site from where they acquire good quality software.<br><br>Make sure that an individual build and buy new laboratory so you're able to research improved barbarians. Eventually, in predicament you take part your market game for most months, you might finally obtain the nirvana of five-star barbarians.<br><br>If you have any queries regarding the place and how to use clash of clans cheats ([http://prometeu.net prometeu.net]), you can get hold of us at our own web site. Basically, it would alone acquiesce all of us so that you tune 2 volume amazing. If you appetite for you to melody added than in and the - as Supercell really acquainted t had actually been all-important - you phrases assorted beeline segments. Theoretically they could maintain a record of alike added bulk products and solutions. If they capital to help allegation offered or beneath for a two day skip, they is able to calmly familiarize 1 supplemental segment. |
| {{Merge from|Mixture distribution|date=September 2010}}
| |
| {{See also|Mixture distribution}}
| |
| In [[statistics]], a '''mixture model''' is a [[probabilistic model]] for representing the presence of [[subpopulation]]s within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the [[mixture distribution]] that represents the [[probability distribution]] of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make [[statistical inference]]s about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information.
| |
| | |
| Some ways of implementing mixture models involve steps that attribute postulated sub-population-identities to individual observations (or weights towards such sub-populations), in which case these can be regarded as types of [[unsupervised learning]] or [[cluster analysis|clustering]] procedures. However not all inference procedures involve such steps.
| |
| | |
| Mixture models should not be confused with models for [[compositional data]], i.e., data whose components are constrained to sum to a constant value (1, 100%, etc.).
| |
| | |
| ==Structure of a mixture model==
| |
| | |
| ===General mixture model===
| |
| A typical finite-dimensional mixture model is a [[hierarchical Bayes model|hierarchical model]] consisting of the following components:
| |
| | |
| *''N'' random variables corresponding to observations, each assumed to be distributed according to a mixture of ''K'' components, with each component belonging to the same [[parametric family]] of distributions (e.g., all Normal, all Zipfian, etc.) but with different parameters
| |
| *''N'' corresponding random [[latent variable]]s specifying the identity of the mixture component of each observation, each distributed according to a ''K''-dimensional [[categorical distribution]]
| |
| *A set of ''K'' mixture weights, each of which is a probability (a real number between 0 and 1 inclusive), all of which sum to 1
| |
| *A set of ''K'' parameters, each specifying the parameter of the corresponding mixture component. In many cases, each "parameter" is actually a set of parameters. For example, observations distributed according to a mixture of one-dimensional [[Gaussian distribution]]s will have a [[mean]] and [[variance]] for each component. Observations distributed according to a mixture of ''V''-dimensional [[categorical distribution]]s (e.g., when each observation is a word from a vocabulary of size ''V'') will have a vector of ''V'' probabilities, collectively summing to 1.
| |
| | |
| In addition, in a [[Bayesian inference|Bayesian setting]], the mixture weights and parameters will themselves be random variables, and [[prior distribution]]s will be placed over the variables. In such a case, the weights are typically viewed as a ''K''-dimensional random vector drawn from a [[Dirichlet distribution]] (the [[conjugate prior]] of the categorical distribution), and the parameters will be distributed according to their respective conjugate priors.
| |
| | |
| Mathematically, a basic parametric mixture model can be described as follows:
| |
| | |
| :<math>
| |
| \begin{array}{lcl}
| |
| K &=& \text{number of mixture components} \\
| |
| N &=& \text{number of observations} \\
| |
| \theta_{i=1 \dots K} &=& \text{parameter of distribution of observation associated with component } i \\
| |
| \phi_{i=1 \dots K} &=& \text{mixture weight, i.e., prior probability of a particular component } i \\
| |
| \boldsymbol\phi &=& K\text{-dimensional vector composed of all the individual } \phi_{1 \dots K} \text{; must sum to 1} \\
| |
| z_{i=1 \dots N} &=& \text{component of observation } i \\
| |
| x_{i=1 \dots N} &=& \text{observation } i \\
| |
| F(x|\theta) &=& \text{probability distribution of an observation, parametrized on } \theta \\
| |
| z_{i=1 \dots N} &\sim& \operatorname{Categorical}(\boldsymbol\phi) \\
| |
| x_{i=1 \dots N} &\sim& F(\theta_{z_i})
| |
| \end{array}
| |
| </math>
| |
| | |
| In a Bayesian setting, all parameters are associated with random variables, as follows:
| |
| | |
| :<math>
| |
| \begin{array}{lcl}
| |
| K,N &=& \text{as above} \\
| |
| \theta_{i=1 \dots K}, \phi_{i=1 \dots K}, \boldsymbol\phi &=& \text{as above} \\
| |
| z_{i=1 \dots N}, x_{i=1 \dots N}, F(x|\theta) &=& \text{as above} \\
| |
| \alpha &=& \text{shared hyperparameter for component parameters} \\
| |
| \beta &=& \text{shared hyperparameter for mixture weights} \\
| |
| H(\theta|\alpha) &=& \text{prior probability distribution of component parameters, parametrized on } \alpha \\
| |
| \theta_{i=1 \dots K} &\sim& H(\alpha) \\
| |
| \boldsymbol\phi &\sim& \operatorname{Symmetric-Dirichlet}_K(\beta) \\
| |
| z_{i=1 \dots N} &\sim& \operatorname{Categorical}(\boldsymbol\phi) \\
| |
| x_{i=1 \dots N} &\sim& F(\theta_{z_i})
| |
| \end{array}
| |
| </math>
| |
| | |
| This characterization uses ''F'' and ''H'' to describe arbitrary distributions over observations and parameters, respectively. Typically ''H'' will be the [[conjugate prior]] of ''F''. The two most common choices of ''F'' are [[Gaussian distribution|Gaussian]] aka "[[normal distribution|normal]]" (for real-valued observations) and [[categorical distribution|categorical]] (for discrete observations). Other common possibilities for the distribution of the mixture components are:
| |
| *[[Binomial distribution]], for the number of "positive occurrences" (e.g., successes, yes votes, etc.) given a fixed number of total occurrences
| |
| *[[Multinomial distribution]], similar to the binomial distribution, but for counts of multi-way occurrences (e.g., yes/no/maybe in a survey)
| |
| *[[Negative binomial distribution]], for binomial-type observations but where the quantity of interest is the number of failures before a given number of successes occurs
| |
| *[[Poisson distribution]], for the number of occurrences of an event in a given period of time, for an event that is characterized by a fixed rate of occurrence
| |
| *[[Exponential distribution]], for the time before the next event occurs, for an event that is characterized by a fixed rate of occurrence
| |
| *[[Log-normal distribution]], for positive real numbers that are assumed to grow exponentially, such as incomes or prices
| |
| *[[Multivariate normal distribution]] (aka [[multivariate Gaussian distribution]]), for vectors of correlated outcomes that are individually Gaussian-distributed
| |
| *A vector of [[Bernoulli distribution|Bernoulli]]-distributed values, corresponding, e.g., to a black-and-white image, with each value representing a pixel; see the handwriting-recognition example below
| |
| | |
| ===Specific examples===
| |
| | |
| ====Gaussian mixture model====
| |
| [[File:nonbayesian-gaussian-mixture.svg|right|250px|thumb|Non-Bayesian Gaussian mixture model using [[plate notation]]. Smaller squares indicate fixed parameters; larger circles indicate random variables. Filled-in shapes indicate known values. The indication [K] means a vector of size ''K''.]]
| |
| | |
| A typical non-Bayesian [[Gaussian distribution|Gaussian]] mixture model looks like this:
| |
| | |
| :<math>
| |
| \begin{array}{lcl}
| |
| K,N &=& \text{as above} \\
| |
| \phi_{i=1 \dots K}, \boldsymbol\phi &=& \text{as above} \\
| |
| z_{i=1 \dots N}, x_{i=1 \dots N} &=& \text{as above} \\
| |
| \mu_{i=1 \dots K} &=& \text{mean of component } i \\
| |
| \sigma^2_{i=1 \dots K} &=& \text{variance of component } i \\
| |
| z_{i=1 \dots N} &\sim& \operatorname{Categorical}(\boldsymbol\phi) \\
| |
| x_{i=1 \dots N} &\sim& \mathcal{N}(\mu_{z_i}, \sigma^2_{z_i})
| |
| \end{array}
| |
| </math>
| |
| | |
| <br style="clear:both" />
| |
| [[File:bayesian-gaussian-mixture.svg|right|300px|thumb|Bayesian Gaussian mixture model using [[plate notation]]. Smaller squares indicate fixed parameters; larger circles indicate random variables. Filled-in shapes indicate known values. The indication [K] means a vector of size ''K''.]]
| |
| | |
| A Bayesian version of a [[Gaussian distribution|Gaussian]] mixture model is as follows:
| |
| | |
| :<math>
| |
| \begin{array}{lcl}
| |
| K,N &=& \text{as above} \\
| |
| \phi_{i=1 \dots K}, \boldsymbol\phi &=& \text{as above} \\
| |
| z_{i=1 \dots N}, x_{i=1 \dots N} &=& \text{as above} \\
| |
| \mu_{i=1 \dots K} &=& \text{mean of component } i \\
| |
| \sigma^2_{i=1 \dots K} &=& \text{variance of component } i \\
| |
| \mu_0, \lambda, \nu, \sigma_0^2 &=& \text{shared hyperparameters} \\
| |
| \mu_{i=1 \dots K} &\sim& \mathcal{N}(\mu_0, \lambda\sigma_i^2) \\
| |
| \sigma_{i=1 \dots K}^2 &\sim& \operatorname{Inverse-Gamma}(\nu, \sigma_0^2) \\
| |
| \boldsymbol\phi &\sim& \operatorname{Symmetric-Dirichlet}_K(\beta) \\
| |
| z_{i=1 \dots N} &\sim& \operatorname{Categorical}(\boldsymbol\phi) \\
| |
| x_{i=1 \dots N} &\sim& \mathcal{N}(\mu_{z_i}, \sigma^2_{z_i})
| |
| \end{array}
| |
| </math>
| |
| | |
| ====Multivariate Gaussian mixture model====
| |
| A Bayesian Gaussian mixture model is commonly extended to fit a vector of unknown parameters (denoted in bold), or multivariate normal distributions. In a multivariate distribution (i.e. one modelling a vector <math>\boldsymbol{x}</math> with ''N'' random variables) one may model a vector of parameters (such as several observations of a signal or patches within an image) using a Gaussian mixture model prior distribution on the vector of estimates given by
| |
| :<math>
| |
| p(\boldsymbol{\theta}) = \sum_{i=1}^K\phi_i \mathcal{N}(\boldsymbol{\mu_i,\Sigma_i})
| |
| </math>
| |
| where the ''i<sup>th</sup>'' vector component is characterized by normal distributions with weights <math>\phi_i</math>, means <math>\boldsymbol{\mu_i}</math> and covariance matrices <math>\boldsymbol{\Sigma_i}</math>. To incorporate this prior into a Bayesian estimation, the prior is multiplied with the known distribution <math>p(\boldsymbol{x | \theta})</math> of the data <math>\boldsymbol{x}</math> conditioned on the parameters <math>\boldsymbol{\theta}</math> to be estimated. With this formulation, the [[Posterior_probability|posterior distribution]] <math>p(\boldsymbol{\theta | x})</math> is ’’also’’ a Gaussian mixture model of the form
| |
| :<math>
| |
| p(\boldsymbol{\theta | x}) = \sum_{i=1}^K\tilde{\phi_i} \mathcal{N}(\boldsymbol{\tilde{\mu_i},\tilde{\Sigma_i}})
| |
| </math>
| |
| with new parameters <math>\tilde{\phi_i}, \boldsymbol{\tilde{\mu_i}}</math> and <math>\boldsymbol{\tilde{\Sigma_i}}</math> that are updated using the [[Expectation-maximization algorithm|EM algorithm]].
| |
| <ref>
| |
| {{cite journal
| |
| |last=Yu |first=Guoshen
| |
| |title=Solving Inverse Problems with Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity
| |
| |journal=IEEE Transactions on Image Processing
| |
| |volume=21 | year=2012|pages=2481-2499 |issue=5
| |
| }}
| |
| </ref> Although EM-based parameter updates are well-established, providing the initial estimates for these parameters is currently an area of active research. Note that this formulation yields a closed-form solution to the complete posterior distribution. Estimations of the random variable <math>\boldsymbol{\theta}</math> may be obtained via one of several estimators, such as the mean or maximum of the posterior distribution.
| |
| | |
| Such distributions are useful for assuming patch-wise shapes of images and clusters, for example. In the case of image representation, each Gaussian may be tilted, expanded, and warped according to the covariance matrices <math>\boldsymbol{\Sigma_i}</math>. One Gaussian distribution of the set is fit to each patch (usually of size 8x8 pixels) in the image. Notably, any distribution of points around a cluster (see [[K-means clustering|''k''-means]]) may be accurately given enough Gaussian components, but scarcely over ''K''=20 components are needed to accurately model a given image distribution or cluster of data.
| |
| | |
| ====Categorical mixture model====
| |
| [[File:nonbayesian-categorical-mixture.svg|right|250px|thumb|Non-Bayesian categorical mixture model using [[plate notation]]. Smaller squares indicate fixed parameters; larger circles indicate random variables. Filled-in shapes indicate known values. The indication [K] means a vector of size ''K''; likewise for [V].]]
| |
| | |
| A typical non-Bayesian mixture model with [[categorical distribution|categorical]] observations looks like this:
| |
| | |
| *<math>K,N:</math> as above
| |
| *<math>\phi_{i=1 \dots K}, \boldsymbol\phi:</math> as above
| |
| *<math>z_{i=1 \dots N}, x_{i=1 \dots N}:</math> as above
| |
| *<math>V:</math> dimension of categorical observations, e.g., size of word vocabulary
| |
| *<math>\theta_{i=1 \dots K, j=1 \dots V}:</math> probability for component <math>i</math> of observing item <math>j</math>
| |
| *<math>\boldsymbol\theta_{i=1 \dots K}:</math> vector of dimension <math>V,</math> composed of <math>\theta_{i,1 \dots V};</math> must sum to 1
| |
| | |
| The random variables:
| |
| :<math>
| |
| \begin{array}{lcl}
| |
| z_{i=1 \dots N} &\sim& \operatorname{Categorical}(\boldsymbol\phi) \\
| |
| x_{i=1 \dots N} &\sim& \text{Categorical}(\boldsymbol\theta_{z_i})
| |
| \end{array}
| |
| </math>
| |
| | |
| <!--
| |
| The original version, all in LaTeX.
| |
| :<math>
| |
| \begin{array}{lcl}
| |
| K,N &=& \text{as above} \\
| |
| \phi_{i=1 \dots K}, \boldsymbol\phi &=& \text{as above} \\
| |
| z_{i=1 \dots N}, x_{i=1 \dots N} &=& \text{as above} \\
| |
| V &=& \text{dimension of categorical observations, e.g., size of word vocabulary} \\
| |
| \theta_{i=1 \dots K, j=1 \dots V} &=& \text{probability for component } i \text{ of observing the } j\text{th item} \\
| |
| \boldsymbol\theta_{i=1 \dots K} &=& V\text{-dimensional vector, composed of }\theta_{i,1 \dots V} \text{; must sum to 1} \\
| |
| z_{i=1 \dots N} &\sim& \operatorname{Categorical}(\boldsymbol\phi) \\
| |
| x_{i=1 \dots N} &\sim& \text{Categorical}(\boldsymbol\theta_{z_i})
| |
| \end{array}
| |
| </math>
| |
| -->
| |
| | |
| <br style="clear:both" />
| |
| [[File:bayesian-categorical-mixture.svg|right|300px|thumb|Bayesian categorical mixture model using [[plate notation]]. Smaller squares indicate fixed parameters; larger circles indicate random variables. Filled-in shapes indicate known values. The indication [K] means a vector of size ''K''; likewise for [V].]]
| |
| | |
| A typical Bayesian mixture model with [[categorical distribution|categorical]] observations looks like this:
| |
| | |
| *<math>K,N:</math> as above
| |
| *<math>\phi_{i=1 \dots K}, \boldsymbol\phi:</math> as above
| |
| *<math>z_{i=1 \dots N}, x_{i=1 \dots N}:</math> as above
| |
| *<math>V:</math> dimension of categorical observations, e.g., size of word vocabulary
| |
| *<math>\theta_{i=1 \dots K, j=1 \dots V}:</math> probability for component <math>i</math> of observing item <math>j</math>
| |
| *<math>\boldsymbol\theta_{i=1 \dots K}:</math> vector of dimension <math>V,</math> composed of <math>\theta_{i,1 \dots V};</math> must sum to 1
| |
| *<math>\alpha:</math> shared concentration hyperparameter of <math>\boldsymbol\theta</math> for each component
| |
| *<math>\beta:</math> concentration hyperparameter of <math>\boldsymbol\phi</math>
| |
| | |
| The random variables:
| |
| :<math>
| |
| \begin{array}{lcl}
| |
| \boldsymbol\phi &\sim& \operatorname{Symmetric-Dirichlet}_K(\beta) \\
| |
| \boldsymbol\theta_{i=1 \dots K} &\sim& \text{Symmetric-Dirichlet}_V(\alpha) \\
| |
| z_{i=1 \dots N} &\sim& \operatorname{Categorical}(\boldsymbol\phi) \\
| |
| x_{i=1 \dots N} &\sim& \text{Categorical}(\boldsymbol\theta_{z_i})
| |
| \end{array}
| |
| </math>
| |
| | |
| <!--
| |
| The (beginning of) equivalent of below, using no LaTeX.
| |
| | |
| *''K'',''N'' = as above
| |
| *φ<sub>1,...,''K''</sub>, '''φ''' as above
| |
| *''z''<sub>''i''=1...''N''</sub>, ''x''<sub>''i''=1...''N''</sub> = as above
| |
| * ''V'' = dimension of categorical observations, e.g., size of word vocabulary
| |
| -->
| |
| <!--
| |
| The equivalent using full LaTeX.
| |
| | |
| :<math>
| |
| \begin{array}{lcl}
| |
| K,N &=& \mbox{as above} \\
| |
| \phi_{i=1 \dots K}, \boldsymbol\phi &=& \text{as above} \\
| |
| z_{i=1 \dots N}, x_{i=1 \dots N} &=& \text{as above} \\
| |
| V &=& \text{dimension of categorical observations, e.g., size of word vocabulary} \\
| |
| \theta_{i=1 \dots K, j=1 \dots V} &=& \text{probability for component } i \text{ of observing the } j\text{th item} \\
| |
| \boldsymbol\theta_{i=1 \dots K} &=& V\text{-dimensional vector, composed of }\theta_{i,1 \dots V} \text{; must sum to 1} \\
| |
| \alpha &=& \text{shared concentration hyperparameter of } \boldsymbol\theta \text{ for each component} \\
| |
| \beta &=& \text{concentration hyperparameter of } \boldsymbol\phi \\
| |
| \boldsymbol\phi &\sim& \operatorname{Symmetric-Dirichlet}_K(\beta) \\
| |
| \boldsymbol\theta_{i=1 \dots K} &\sim& \text{Symmetric-Dirichlet}_V(\alpha) \\
| |
| z_{i=1 \dots N} &\sim& \operatorname{Categorical}(\boldsymbol\phi) \\
| |
| x_{i=1 \dots N} &\sim& \text{Categorical}(\boldsymbol\theta_{z_i})
| |
| \end{array}
| |
| </math>
| |
| -->
| |
| | |
| ==Examples==
| |
| | |
| ===A financial model===
| |
| [[Image:Normal distribution pdf.png|thumb|right|250px|The [[normal distribution]] is plotted using different means and variances]]
| |
| | |
| Financial returns often behave differently in normal situations and during crisis times. A mixture model <ref>Dinov, ID. "[http://repositories.cdlib.org/socr/EM_MM/ Expectation Maximization and Mixture Modeling Tutorial]". ''[http://repositories.cdlib.org/escholarship California Digital Library]'', Statistics Online Computational Resource, Paper EM_MM, http://repositories.cdlib.org/socr/EM_MM, December 9, 2008</ref> for return data seems reasonable. Sometimes the model used is a [[jump-diffusion model]], or as a mixture of two normal distributions.
| |
| | |
| ===House prices===
| |
| Assume that we observe the prices of ''N'' different houses. Different types of houses in different neighborhoods will have vastly different prices, but the price of a particular type of house in a particular neighborhood (e.g., three-bedroom house in moderately upscale neighborhood) will tend to cluster fairly closely around the mean. One possible model of such prices would be to assume that the prices are accurately described by a mixture model with ''K'' different components, each distributed as a [[normal distribution]] with unknown mean and variance, with each component specifying a particular combination of house type/neighborhood. Fitting this model to observed prices, e.g., using the [[expectation-maximization algorithm]], would tend to cluster the prices according to house type/neighborhood and reveal the spread of prices in each type/neighborhood. (Note that for values such as prices or incomes that are guaranteed to be positive and which tend to grow [[exponential growth|exponentially]], a [[log-normal distribution]] might actually be a better model than a normal distribution.)
| |
| | |
| ===Topics in a document===
| |
| Assume that a document is composed of ''N'' different words from a total vocabulary of size ''V'', where each word corresponds to one of ''K'' possible topics. The distribution of such words could be modelled as a mixture of ''K'' different ''V''-dimensional [[categorical distribution]]s. A model of this sort is commonly termed a [[topic model]]. Note that [[expectation maximization]] applied to such a model will typically fail to produce realistic results, due (among other things) to the [[overfitting|excessive number of parameters]]. Some sorts of additional assumptions are typically necessary to get good results. Typically two sorts of additional components are added to the model:
| |
| #A [[prior distribution]] is placed over the parameters describing the topic distributions, using a [[Dirichlet distribution]] with a [[concentration parameter]] that is set significantly below 1, so as to encourage sparse distributions (where only a small number of words have significantly non-zero probabilities).
| |
| #Some sort of additional constraint is placed over the topic identities of words, to take advantage of natural clustering.
| |
| :*For example, a [[Markov chain]] could be placed on the topic identities (i.e., the latent variables specifying the mixture component of each observation), corresponding to the fact that nearby words belong to similar topics. (This results in a [[hidden Markov model]], specifically one where a [[prior distribution]] is placed over state transitions that favors transitions that stay in the same state.)
| |
| :*Another possibility is the [[latent Dirichlet allocation]] model, which divides up the words into ''D'' different documents and assumes that in each document only a small number of topics occur with any frequency.
| |
| | |
| ===Handwriting recognition===
| |
| The following example is based on an example in [[Christopher M. Bishop]], ''Pattern Recognition and Machine Learning''.<ref>{{cite book | last = Bishop | first = Christopher | title = Pattern recognition and machine learning | publisher = Springer | location = New York | year = 2006 | isbn = 978-0-387-31073-2 }}</ref>
| |
| | |
| Imagine that we are given an ''N''×''N'' black-and-white image that is known to be a scan of a hand-written digit between 0 and 9, but we don't know which digit is written. We can create a mixture model with <math>K=10</math> different components, where each component is a vector of size <math>N^2</math> of [[Bernoulli distribution]]s (one per pixel). Such a model can be trained with the [[expectation-maximization algorithm]] on an unlabeled set of hand-written digits, and will effectively cluster the images according to the digit being written. The same model could then be used to recognize the digit of another image simply by holding the parameters constant, computing the probability of the new image for each possible digit (a trivial calculation), and returning the digit that generated the highest probability.
| |
| | |
| ===Direct and indirect applications===
| |
| | |
| The financial example above is one direct application of the mixture model, a situation in which we assume an underlying mechanism so that each observation belongs to one of some number of different sources or categories. This underlying mechanism may or may not, however, be observable. In this form of mixture, each of the sources is described by a component probability density function, and its mixture weight is the probability that an observation comes from this component.
| |
| | |
| In an indirect application of the mixture model we do not assume such a mechanism. The mixture model is simply used for its mathematical flexibilities. For example, a mixture of two [[normal distribution]]s with different means may result in a density with two [[Mode (statistics)|modes]], which is not modeled by standard parametric distributions. Another example is given by the possibility of mixture distributions to model fatter tails than the basic Gaussian ones, so as to be a candidate for modeling more extreme events. When combined with [[dynamical consistency]], this approach has been applied to [[financial derivatives]] valuation in presence of the [[volatility smile]] in the context of [[local volatility]] models. This defines our application.
| |
| | |
| ===Fuzzy image segmentation===
| |
| In image processing and computer vision, traditional [[image segmentation]] models often assign to one [[pixel]] only one exclusive pattern. In fuzzy or soft segmentation, any pattern can have certain "ownership" over any single pixel. If the patterns are Gaussian, fuzzy segmentation naturally results in Gaussian mixtures. Combined with other analytic or geometric tools (e.g., phase transitions over diffusive boundaries), such spatially regularized mixture models could lead to more realistic and computationally efficient segmentation methods.<ref>
| |
| {{cite journal
| |
| | last = Shen | first = Jianhong (Jackie)
| |
| | title = A stochastic-variational model for soft Mumford-Shah segmentation
| |
| | year = 2006
| |
| | volume=2006
| |
| | pages=2-16
| |
| | journal=Int'l J. Biomedical Imaging
| |
| | doi=10.1155/IJBI/2006/92329
| |
| | url=http://www.hindawi.com/journals/ijbi/2006/092329/abs/
| |
| }}</ref>
| |
| | |
| == Identifiability ==
| |
| | |
| Identifiability refers to the existence of a unique characterization for any one of the models in the class (family) being considered. Estimation procedure may not be well-defined and asymptotic theory may not hold if a model is not identifiable.
| |
| | |
| === Example ===
| |
| Let ''J'' be the class of all binomial distributions with {{nowrap|''n'' {{=}} 2}}. Then a mixture of two members of ''J'' would have
| |
| | |
| :<math>p_0=\pi(1-\theta_1)^2+(1-\pi)(1-\theta_2)^2</math>
| |
| :<math>p_1=2\pi\theta_1(1-\theta_1)+2(1-\pi)\theta_2(1-\theta_2)</math>
| |
| | |
| and {{nowrap|''p''<sub>2</sub> {{=}} 1 − ''p''<sub>0</sub> − ''p''<sub>1</sub>}}. Clearly, given ''p''<sub>0</sub> and ''p''<sub>1</sub>, it is not possible to determine the above mixture model uniquely, as there are three parameters (''π'', ''θ''<sub>1</sub>, ''θ''<sub>2</sub>) to be determined.
| |
| | |
| === Definition ===
| |
| Consider a mixture of parametric distributions of the same class. Let
| |
| | |
| :<math>J=\{f(\cdot ; \theta):\theta\in\Omega\}</math>
| |
| | |
| be the class of all component distributions. Then the [[convex hull]] ''K'' of ''J'' defines the class of all finite mixture of distributions in ''J'':
| |
| | |
| :<math>K=\left\{p(\cdot):p(\cdot)=\sum_{i=1}^n a_i f_i(\cdot ; \theta_i), a_i>0, \sum_{i=1}^n a_i=1, f_i(\cdot ; \theta_i)\in J\ \forall i,n\right\}</math>
| |
| | |
| ''K'' is said to be identifiable if all its members are unique, that is, given two members ''p'' and {{nowrap|''p′''}} in ''K'', being mixtures of ''k'' distributions and {{nowrap|''k′''}} distributions respectively in ''J'', we have {{nowrap|''p {{=}} p′''}} if and only if, first of all, {{nowrap|''k {{=}} k′''}} and secondly we can reorder the summations such that {{nowrap|''a<sub>i</sub> {{=}} a<sub>i</sub>''′}} and {{nowrap|''ƒ<sub>i</sub> {{=}} ƒ<sub>i</sub>''′}} for all ''i''.
| |
| | |
| == Parameter estimation and system identification ==
| |
| | |
| Parametric mixture models are often used when we know the distribution ''Y'' and we can sample from ''X'', but we would like to determine the ''a<sub>i</sub>'' and ''θ<sub>i</sub>'' values. Such situations can arise in studies in which we sample from a population that is composed of several distinct subpopulations.
| |
| | |
| It is common to think of probability mixture modeling as a missing data problem. One way to understand this is to assume that the data points under consideration have "membership" in one of the distributions we are using to model the data. When we start, this membership is unknown, or missing. The job of estimation is to devise appropriate parameters for the model functions we choose, with the connection to the data points being represented as their membership in the individual model distributions.
| |
| | |
| A variety of approaches to the problem of mixture decomposition have been proposed, many of which focus on maximum likelihood methods such as [[expectation maximization]] (EM) or maximum ''a posteriori'' estimation (MAP). Generally these methods consider separately the question of parameter estimation and system identification, that is to say a distinction is made between the determination of the number and functional form of components within a mixture and the estimation of the corresponding parameter values. Some notable departures are the graphical methods as outlined in Tarter and Lock <ref name=tart>{{citation
| |
| |title=Model Free Curve Estimation
| |
| |author=Michael E. Tarter
| |
| |year=1993
| |
| |publisher=Chapman and Hall
| |
| }}</ref> and more recently [[minimum message length]] (MML) techniques such as Figueiredo and Jain <ref name=Jain>{{cite journal |first1=M.A.T. |last1=Figueiredo |first2=A.K. |last2=Jain |title=Unsupervised Learning of Finite Mixture Models |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=24 |issue=3 |pages=381–396 |date=March 2002 |doi=10.1109/34.990138 |url=http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=990138
| |
| }}</ref> and to some extent the moment matching pattern analysis routines suggested by McWilliam and Loh (2009).<ref name=mcwilli>
| |
| {{citation
| |
| |title=Incorporating Multidimensional Tail-Dependencies in the Valuation of Credit Derivatives (Working Paper)
| |
| |author=N. McWilliam, K. Loh
| |
| |year=2008
| |
| }} [http://www.misys.com/cds-portlets/digitalAssets/4/2797_CDsAndTailDep_forPublication_final1.pdf]</ref>
| |
| | |
| === Expectation maximization (EM) === <!-- Linked from [[Expectation-maximization algorithm]] -->
| |
| | |
| [[Expectation-maximization algorithm|Expectation maximization]] (EM) is seemingly the most popular technique used to determine the parameters of a mixture with an ''a priori'' given number of components. This is a particular way of implementing [[maximum likelihood]] estimation for this problem. EM is of particular appeal for finite normal mixtures where closed-form expressions are possible such as in the following iterative algorithm by Dempster ''et al.'' (1977)
| |
| | |
| :<math> w_s^{(j+1)} = \frac{1}{N} \sum_{t =1}^N h_s^{(j)}(t) </math>
| |
| :<math> \mu_s^{(j+1)} = \frac{\sum_{t =1}^N h_s^{(j)}(t) x^{(t)}}{\sum_{t =1}^N h_s^{(j)}(t)} </math>
| |
| :<math> \Sigma_s^{(j+1)} = \frac{\sum_{t =1}^N h_s^{(j)}(t) [x^{(t)}-\mu_s^{(j)}][x^{(t)}-\mu_s^{(j)}]^{\top}}{\sum_{t =1}^N h_s^{(j)}(t)} </math>
| |
| with the posterior probabilities
| |
| :<math> h_s^{(j)}(t) = \frac{w_s^{(j)} p_s(x^{(t)}; \mu_s^{(j)},\Sigma_s^{(j)}) }{ \sum_{i = 1}^n w_i^{(j)} p_i(x^{(t)}; \mu_i^{(j)}, \Sigma_i^{(j)})}. </math>
| |
| | |
| Thus on the basis of the current estimate for the parameters, the conditional probability for a given observation ''x''<sup>(''t'')</sup> being generated from state ''s'' is determined for each {{nowrap|''t'' {{=}} 1, …, ''N''}} ; ''N'' being the sample size. The parameters are then updated such that the new component weights correspond to the average conditional probability and each component mean and covariance is the component specific weighted average of the mean and covariance of the entire sample.
| |
| | |
| Dempster<ref name=dempster1977>{{cite journal |first1=A.P. |last1=Dempster |first2=N.M. |last2=Laird |first3=D.B. |last3=Rubin |title=Maximum Likelihood from Incomplete Data via the EM Algorithm |journal=Journal of the Royal Statistical Society, Series B |volume=39 |issue=1 |pages=1–38 |year=1977 |jstor=2984875 |url=http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.163.7580&rep=rep1&type=pdf}}
| |
| </ref> also showed that each successive EM iteration will not decrease the likelihood, a property not shared by other gradient based maximization techniques. Moreover EM naturally embeds within it constraints on the probability vector, and for sufficiently large sample sizes positive definiteness of the covariance iterates. This is a key advantage since explicitly constrained methods incur extra computational costs to check and maintain appropriate values. Theoretically EM is a first-order algorithm and as such converges slowly to a fixed-point solution. Redner and Walker (1984){{full|date=November 2012}} make this point arguing in favour of superlinear and second order Newton and quasi-Newton methods and reporting slow convergence in EM on the basis of their empirical tests. They do concede that convergence in likelihood was rapid even if convergence in the parameter values themselves was not. The relative merits of EM and other algorithms vis-à-vis convergence have been discussed in other literature.<ref name=XuJordam>{{cite journal |first1=L. |last1=Xu |first2=M.I. |last2=Jordan |title=On Convergence Properties of the EM Algorithm for Gaussian Mixtures |journal=Neural Computation |volume=8 |issue=1 |pages=129–151 |date=January 1996 |doi=10.1162/neco.1996.8.1.129 |url=http://www.mitpressjournals.org/doi/abs/10.1162/neco.1996.8.1.129}}</ref>
| |
| | |
| Other common objections to the use of EM are that it has a propensity to spuriously identify local maximisers,<ref name="McLaughlan_2" /> as well as displaying sensitivity to initial values.{{citation needed|date=December 2010}} One may address these problems by evaluating EM at several initial points in the parameter space but this is computationally costly and other approaches, such as the annealing EM method of Udea and Nakano (1998) (in which the initial components are essentially forced to overlap, providing a less heterogeneous basis for initial guesses), may be preferable.
| |
| | |
| Figueiredo and Jain <ref name="Jain" /> note that convergence to 'meaningless' parameter values obtained at the boundary (where regularity conditions breakdown, e.g., Ghosh and Sen (1985)) is frequently observed when the number of model components exceeds the optimal/true one. On this basis they suggest a unified approach to estimation and identification in which the initial ''n'' is chosen to greatly exceed the expected optimal value. Their optimization routine is constructed via a minimum message length (MML) criterion that effectively eliminates a candidate component if there is insufficient information to support it. In this way it is possible to systematize reductions in ''n'' and consider estimation and identification jointly.
| |
| | |
| The [[Expectation-maximization algorithm]] can be used to compute the parameters of a parametric mixture model distribution (the ''a<sub>i</sub>'' and ''θ<sub>i</sub>''). It is an [[iterative algorithm]] with two steps: an ''expectation step'' and a ''maximization step''. [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_2D_PointSegmentation_EM_Mixture Practical examples of EM and Mixture Modeling] are included in the [[SOCR]] demonstrations.
| |
| | |
| ==== The expectation step ====
| |
| With initial guesses for the parameters of our mixture model, "partial membership" of each data point in each constituent distribution is computed by calculating [[expectation value]]s for the membership variables of each data point. That is, for each data point ''x<sub>j</sub>'' and distribution ''Y<sub>i</sub>'', the membership value ''y''<sub>''i'', ''j''</sub> is:
| |
| | |
| :<math> y_{i,j} = \frac{a_i f_Y(x_j;\theta_i)}{f_{X}(x_j)}.</math>
| |
| | |
| ==== The maximization step ====
| |
| With expectation values in hand for group membership, [[plug-in estimates]] are recomputed for the distribution parameters.
| |
| | |
| The mixing coefficients ''a<sub>i</sub>'' are the [[arithmetic mean|mean]]s of the membership values over the ''N'' data points.
| |
| | |
| :<math> a_i = \frac{1}{N}\sum_{j=1}^N y_{i,j}</math>
| |
| | |
| The component model parameters ''θ<sub>i</sub>'' are also calculated by expectation maximization using data points ''x<sub>j</sub>'' that have been weighted using the membership values. For example, if ''θ'' is a mean ''μ''
| |
| | |
| :<math> \mu_{i} = \frac{\sum_{j} y_{i,j}x_{j}}{\sum_{j} y_{i,j}}.</math>
| |
| | |
| With new estimates for ''a<sub>i</sub>'' and the ''θ<sub>i</sub>'''s, the expectation step is repeated to recompute new membership values. The entire procedure is repeated until model parameters converge.
| |
| | |
| === Markov chain Monte Carlo ===
| |
| As an alternative to the EM algorithm, the mixture model parameters can be deduced using [[posterior sampling]] as indicated by [[Bayes' theorem]]. This is still regarded as an incomplete data problem whereby membership of data points is the missing data. A two-step iterative procedure known as [[Gibbs sampling]] can be used.
| |
| | |
| The previous example of a mixture of two [[Gaussian distribution]]s can demonstrate how the method works. As before, initial guesses of the parameters for the mixture model are made. Instead of computing partial memberships for each elemental distribution, a membership value for each data point is drawn from a [[Bernoulli distribution]] (that is, it will be assigned to either the first or the second Gaussian). The Bernoulli parameter ''θ'' is determined for each data point on the basis of one of the constituent distributions.{{Vague|What does this mean?|date=March 2008}} Draws from the distribution generate membership associations for each data point. Plug-in estimators can then be used as in the M step of EM to generate a new set of mixture model parameters, and the binomial draw step repeated.
| |
| | |
| === Moment matching ===
| |
| The method of moment matching is one of the oldest techniques for determining the mixture parameters dating back to Karl Pearson’s seminal work of 1894.
| |
| In this approach the parameters of the mixture are determined such that the composite distribution has moments matching some given value. In many instances extraction of solutions to the moment equations may present non-trivial algebraic or computational problems. Moreover numerical analysis by Day <ref name=day>{{cite jstor|2334652}}</ref> has indicated that such methods may be inefficient compared to EM. Nonetheless there has been renewed interest in this method, e.g., Craigmile and Titterington (1998) and Wang.<ref name=wang>{{citation
| |
| |title=Generating daily changes in market variables using a multivariate mixture of normal distributions
| |
| |author=J. Wang
| |
| |year=2001
| |
| |journal = Proceedings of the 33rd winter conference on simulation,IEEE Computer Society
| |
| |pages =283–289
| |
| }}</ref>
| |
| | |
| McWilliam and Loh (2009) consider the characterisation of a hyper-cuboid normal mixture [[copula (statistics)|copula]] in large dimensional systems for which EM would be computationally prohibitive. Here a pattern analysis routine is used to generate multivariate tail-dependencies consistent with a set of univariate and (in some sense) bivariate moments. The performance of this method is then evaluated using equity log-return data with [[Kolmogorov–Smirnov]] test statistics suggesting a good descriptive fit.
| |
| | |
| ===Spectral method===
| |
| Some problems in mixture model estimation can be solved using [[spectral method]]s.
| |
| In particular it becomes useful if data points ''x<sub>i</sub>'' are points in high-dimensional [[real coordinate space|real space]], and the hidden distributions are known to be [[Logarithmically concave function|log-concave]] (such as [[Gaussian distribution]] or [[Exponential distribution]]).
| |
| | |
| Spectral methods of learning mixture models are based on the use of [[Singular Value Decomposition]] of a matrix which contains data points.
| |
| The idea is to consider the top ''k'' singular vectors, where ''k'' is the number of distributions to be learned. The projection
| |
| of each data point to a [[linear subspace]] spanned by those vectors groups points originating from the same distribution
| |
| very close together, while points from different distributions stay far apart.
| |
| | |
| One distinctive feature of the spectral method is that it allows us to [[Mathematical proof|prove]] that if
| |
| distributions satisfy certain separation condition (e.g., not too close), then the estimated mixture will be very close to the true one with high probability.
| |
| | |
| === Graphical Methods ===
| |
| | |
| Tarter and Lock <ref name="tart" /> describe a graphical approach to mixture identification in which a kernel function is applied to an empirical frequency plot so to reduce intra-component variance. In this way one may more readily identify components having differing means. While this ''λ''-method does not require prior knowledge of the number or functional form of the components its success does rely on the choice of the kernel parameters which to some extent implicitly embeds assumptions about the component structure.
| |
| | |
| === Other methods ===
| |
| | |
| Some of them can even probably learn mixtures of [[heavy-tailed distribution]]s including those with
| |
| infinite [[variance]] (see [[#Recent Papers|links to papers]] below).
| |
| In this setting, EM based methods would not work, since the Expectation step would diverge due to presence of
| |
| [[outlier]]s.
| |
| | |
| === A simulation ===
| |
| To simulate a sample of size ''N'' that is from a mixture of distributions ''F''<sub>''i''</sub>, ''i''= = 1= to= ''n'', with probabilities ''p''<sub>''i''</sub> (sum= ''p''<sub>''i''</sub> = 1):
| |
| # Generate ''N'' random numbers from a [[categorical distribution]] of size ''n'' and probabilities ''p''<sub>''i''</sub> for ''i''= 1= to ''n''. These tell you which of the ''F''<sub>''i''</sub> each of the ''N'' values will come from. Denote by ''m<sub>i</sub>'' the quantity of random numbers assigned to the ''i''<sup>th</sup> category.
| |
| # For each ''i'', generate ''m<sub>i</sub>'' random numbers from the ''F''<sub>''i''</sub> distribution.
| |
| | |
| == Extensions ==
| |
| In a [[Bayesian inference|Bayesian setting]], additional levels can be added to the [[graphical model]] defining the mixture model. For example, in the common [[latent Dirichlet allocation]] [[topic model]], the observations are sets of words drawn from ''D'' different documents and the ''K'' mixture components represent topics that are shared across documents. Each document has a different set of mixture weights, which specify the topics prevalent in that document. All sets of mixture weights share common [[hyperparameter]]s.
| |
| | |
| A very common extension is to connect the [[latent variable]]s defining the mixture component identities into a [[Markov chain]], instead of assuming that they are [[independent identically distributed]] random variables. The resulting model is termed a [[hidden Markov model]] and is one of the most common sequential hierarchical models. Numerous extensions of hidden Markov models have been developed; see the resulting article for more information.
| |
| | |
| == History ==
| |
| Mixture distributions and the problem of mixture decomposition, that is the identification of its constituent components and the parameters thereof, has been cited in the literature as far back as 1846 (Quetelet in McLaughlan
| |
| ,<ref name=McLaughlan_2>{{citation
| |
| |title=Finite Mixture Models
| |
| |first=G.J. |last=McLaughlan
| |
| |publisher=Wiley
| |
| |year=2000
| |
| }}</ref> 2000) although common reference is made to the work of [[Karl Pearson]] (1894){{citation needed|date=December 2010}} as the first author to explicitly address the decomposition problem in characterising non-normal attributes of forehead to body length ratios in female shore crab populations. The motivation for this work was provided by the zoologist [[Walter Frank Raphael Weldon]] who had speculated in 1893 (in Tarter and Lock<ref name="tart" />) that asymmetry in the histogram of these ratios could signal evolutionary divergence. Pearson’s approach was to fit a univariate mixture of two normals to the data by choosing the five parameters of the mixture such that the empirical moments matched that of the model.
| |
| | |
| While his work was successful in identifying two potentially distinct sub-populations and in demonstrating the flexibility of mixtures as a moment matching tool, the formulation required the solution of a 9th degree (nonic) polynomial which at the time posed a significant computational challenge.
| |
| | |
| Subsequent works focused on addressing these problems, but it was not until the advent of the modern computer and the popularisation of [[Maximum Likelihood]] (ML) parameterisation techniques that research really took off.<ref name=McLaughlan_1>{{citation
| |
| |title=Mixture Models: inference and applications to clustering
| |
| |first=G.J. |last=McLaughlan
| |
| |publisher=Dekker
| |
| |year=1988
| |
| }}</ref> Since that time there has been a vast body of research on the subject spanning areas such as Fisheries research, Agriculture, Botany, Economics, Medicine, Genetics, Psychology, Palaeontology, Electrophoresis, Finance, Sedimentology/Geology and Zoology.<ref name=titter_1>{{harvnb|Titterington|Smith|Makov|1985}}</ref>
| |
| | |
| == See also ==
| |
| | |
| === Mixture ===
| |
| * [[Mixture density]]
| |
| * [[Mixture (probability)]]
| |
| | |
| === Hierarchical models ===
| |
| * [[Graphical model]]
| |
| * [[Hierarchical Bayes model]]
| |
| | |
| === Outlier detection ===
| |
| * [[RANSAC]]
| |
| | |
| {{More footnotes|date=November 2010}}
| |
| | |
| == References ==
| |
| {{Reflist}}
| |
| | |
| == Further reading ==
| |
| | |
| === Books on mixture models ===
| |
| *{{cite book |last1=Everitt |first1=B.S. |last2=Hand |first2=D.J. |title=Finite mixture distributions |publisher=Chapman & Hall |year=1981 |isbn=0-412-22420-8 }}
| |
| * Lindsay B.G. (1995) "Mixture Models: Theory, Geometry, and Applications". ''NSF-CBMS Regional Conference Series in Probability and Statistics'', Vol. 5, Institute of Mathematical Statistics, Hayward.
| |
| *{{cite book |last1=Marin |first1=J.M. |last2=Mengersen |first2=K. |last3=Robert |first3=C.P. |chapter=Bayesian modelling and inference on mixtures of distributions |chapterurl=http://www.ceremade.dauphine.fr/%7Exian/mixo.pdf |editor1-first=D. |editor1-last=Dey |editor2-first=C.R. |editor2-last=Rao |title=Essential Bayesian models |publisher=Elsevier |year=2011 |isbn=9780444537324 |pages= |url= |series=Handbook of statistics: Bayesian thinking - modeling and computation |volume=25}}
| |
| *{{cite book |last1=McLachlan |first1=G.J. |last2=Peel |first2=D. |title=Finite Mixture Models |publisher=Wiley |year=2000 |isbn=0-471-00626-2 }}
| |
| *{{Cite book | last1=Press | first1=WH | last2=Teukolsky | first2=SA | last3=Vetterling | first3=WT | last4=Flannery | first4=BP | year=2007 | title=Numerical Recipes: The Art of Scientific Computing | edition=3rd | publisher=Cambridge University Press | publication-place=New York | isbn=978-0-521-88068-8 | chapter=Section 16.1. Gaussian Mixture Models and k-Means Clustering | chapter-url=http://apps.nrbook.com/empanel/index.html#pg=842}}
| |
| *{{cite book |last1=Titterington |first1=D. |first2=A. |last2=Smith |first3=U. |last3=Makov |title=Statistical Analysis of Finite Mixture Distributions |publisher=Wiley |year=1985 |isbn=0-471-90763-4 |ref=harv}}
| |
| | |
| ===Application of Gaussian mixture models===
| |
| #{{cite journal |first1=D.A. |last1=Reynolds |first2=R.C. |last2=Rose |title=Robust text-independent speaker identification using Gaussian mixture speaker models |journal=IEEE Transactions on Speech and Audio Processing |date=January 1995 | url=http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=365379 |volume=3 |issue=1 |pages=72–83 |doi=10.1109/89.365379 }}
| |
| #{{cite conference |first1=H. |last1=Permuter |first2=J. |last2=Francos |first3=I.H. |last3=Jermyn |title=Gaussian mixture models of texture and colour for image database retrieval| booktitle= IEEE [[International Conference on Acoustics, Speech, and Signal Processing]], 2003. Proceedings (ICASSP '03)| year=2003 | url=http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199538}}[http://www.sciencedirect.com/science/article/pii/S0031320305004334 The journal version]
| |
| #{{cite book|author=Wolfgang Lemke|year=2005|title=Term Structure Modeling and Estimation in a State Space Framework|publisher=Springer Verlag|isbn=978-3-540-28342-3}}
| |
| #{{cite conference | author=[[Damiano Brigo]] and [[Fabio Mercurio]]|title=Displaced and Mixture Diffusions for Analytically-Tractable Smile Models| booktitle= Mathematical Finance — Bachelier Congress 2000. Proceedings | year=2001|publisher=Springer Verlag }}
| |
| #{{cite journal |first1=Damiano |last1=Brigo |first2=Fabio |last2=Mercurio |title=Lognormal-mixture dynamics and calibration to market volatility smiles |journal=International Journal of Theoretical and Applied Finance |volume=5 |issue=4 |page=427 |date=June 2002 |doi=10.1142/S0219024902001511 |url=http://www.worldscientific.com/doi/abs/10.1142/S0219024902001511}}
| |
| #{{cite journal |first1=Carol |last1=Alexander |title=Normal mixture diffusion with uncertain volatility: Modelling short- and long-term smile effects |journal=Journal of Banking & Finance |volume=28 |issue=12 |pages=2957–80 |date=December 2004 |doi=10.1016/j.jbankfin.2003.10.017 |url=http://www.carolalexander.org/publish/download/JournalArticles/PDFs/JBF2004.pdf |format=PDF}}
| |
| #{{cite conference | author=Yannis Stylianou, Yannis Pantazis, Felipe Calderero, Pedro Larroy, Francois Severin, Sascha Schimke, Rolando Bonal, Federico Matta, and Athanasios Valsamakis |title=GMM-Based Multimodal Biometric Verification| year=2005 |url=http://www.enterface.net/enterface05/docs/results/reports/project5.pdf}}
| |
| | |
| == External links ==
| |
| *{{cite arxiv |first1=Frank |last1=Nielsen |title=$k$-MLE: A fast algorithm for learning statistical mixture models |date=23 March 2012 |eprint=1203.5181}}
| |
| * The [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_2D_PointSegmentation_EM_Mixture SOCR demonstrations of EM and Mixture Modeling]
| |
| *[http://www.csse.monash.edu.au/~dld/mixturemodel.html Mixture modelling page] (and the [http://www.csse.monash.edu.au/~dld/Snob.html Snob] program for [[Minimum Message Length]] ([[Minimum Message Length|MML]]) applied to finite mixture models), maintained by D.L. Dowe.
| |
| *[http://www.pymix.org PyMix] — Python Mixture Package, algorithms and data structures for a broad variety of mixture model based data mining applications in Python
| |
| *[http://scikit-learn.org/stable/modules/mixture.html scikit-learn.mixture.GMM ] — A Python package for learning Gaussian Mixture Models (and sampling from them), previously packaged with [[SciPy]] and now packaged as a [http://scikits.appspot.com/ SciKit]
| |
| *[http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=18785&objectType=FILE GMM.m] Matlab code for GMM Implementation
| |
| *[http://stat.duke.edu/gpustatsci/software.html GPUmix] C++ implementation of Bayesian Mixture Models using EM and MCMC with 100x speed acceleration using GPGPU.
| |
| *[http://www.cs.ru.nl/~ali/index_files/EM.m] Matlab code for GMM Implementation using EM algorithm
| |
| *[http://vincentfpgarcia.github.com/jMEF/] jMEF: A Java open source library for learning and processing mixtures of exponential families (using duality with Bregman divergences). Includes a Matlab wrapper.
| |
| * Very Fast and clean C implementation of the [https://github.com/juandavm/em4gmm Expectation Maximization] (EM) algorithm for estimating [https://github.com/juandavm/em4gmm Gaussian Mixture Models] (GMMs).
| |
| | |
| {{DEFAULTSORT:Mixture Model}}
| |
| [[Category:Statistical models]]
| |
| [[Category:Cluster analysis]]
| |
| [[Category:Latent variable models]]
| |
| [[Category:Probabilistic models]]
| |
| [[Category:Machine learning]]
| |