# Polya urn model

In statistics, a Pólya urn model (also known as a Pólya urn scheme or simply as Pólya's urn), named after George Pólya, is a type of statistical model used as an idealized mental exercise framework, unifying many treatments.

In an urn model, objects of real interest (such as atoms, people, cars, etc.) are represented as colored balls in an urn or other container. In the basic Pólya urn model, the urn contains x white and y black balls; one ball is drawn randomly from the urn and its color observed; it is then replaced in the urn, and an additional ball of the same color is added to the urn, and the selection process is repeated. Questions of interest are the evolution of the urn population and the sequence of colors of the balls drawn out.

This endows the urn with a self-reinforcing property sometimes expressed as the rich get richer.

Note that in some sense, the Pólya urn model is the "opposite" of the model of sampling without replacement. When sampling without replacement, every time a particular value is observed, it is less likely to be observed again, whereas in a Polya urn model, an observed value is more likely to be observed again. In both of these models, the act of measurement has an effect on the outcome of future measurements. (For comparison, when sampling with replacement, observation of a particular value has no effect on how likely it is to observe that value again.) Note also that in a Polya urn model, successive acts of measurement over time have less and less effect on future measurements, whereas in sampling without replacement, the opposite is true: After a certain number of measurements of a particular value, that value will never be seen again.

One of the reasons for interest in this particular rather elaborate urn model (i.e. with duplication and then replacement of each ball drawn) is that it provides an example in which the count (initially x black and y white) of balls in the urn is not concealed, which is able to approximate the correct updating of subjective probabilities appropriate to a different case in which the original urn content is concealed while ordinary sampling with replacement is conducted (without the Polya ball-duplication). Because of the simple "sampling with replacement" scheme in this second case, the urn content is now static, but this greater simplicity is compensated for by the assumption that the urn content is now unknown to an observer. A Bayesian analysis of the observer's uncertainty about the urn's initial content can be made, using a particular choice of (conjugate) prior distribution. Specifically, suppose that an observer knows that the urn contains only identical balls, each coloured either black or white, but he does not know the absolute number of balls present, nor the proportion that are of each colour. Suppose that he holds prior beliefs about these unknowns: for him the probability distribution of the urn content is well approximated by some prior distribution for the total number of balls in the urn, and a beta prior distribution with parameters (x,y) for the initial proportion of these which are black, this proportion being (for him) considered approximately independent of the total number. Then the process of outcomes of a succession of draws from the urn (with replacement but without the duplication) has approximately the same probability law as does the above Polya scheme in which the actual urn content was not hidden from him. The approximation error here relates to the fact that an urn containing a known finite number m of balls of course cannot have an exactly beta-distributed unknown proportion of black balls, since the domain of possible values for that proportion are confined to being multiples of ${\displaystyle 1/m}$, rather than having the full freedom to assume any value in the continuous unit interval, as would an exactly beta distributed proportion. This slightly informal account is provided for reason of motivation, and can be made more mathematically precise.

This basic Pólya urn model model has been enriched and generalized in many ways.