In statistics and data mining, affinity propagation (AP) is a clustering algorithm based on the concept of "message passing" between data points. Unlike clustering algorithms such as [[K-means clustering|Template:Mvar-means]] or [[K-medoids|Template:Mvar-medoids]], AP does not require the number of clusters to be determined or estimated before running the algorithm. Like Template:Mvar-medoids, AP finds "exemplars", members of the input set that are representative of clusters.
Let x1 through Template:Mvar be a set of data points, with no assumptions made about their internal structure, and let Template:Mvar be a function that quantifies the similarity between any two points, such that s(xi, xj) > s(xi, xk) iff Template:Mvar is more similar to Template:Mvar than to Template:Mvar.
The algorithm proceeds by alternating two message passing steps, to update two matrices:
- The "responsibility" matrix R has values r(i, k) that quantify how well-suited Template:Mvar is to serve as the exemplar for Template:Mvar, relative to other candidate exemplars for Template:Mvar.
- The "availability" matrix A contains values a(i, k) represents how "appropriate" it would be for Template:Mvar to pick Template:Mvar as its exemplar, taking into account other points' preference for Template:Mvar as an exemplar.
Both matrices are initialized to all zeroes, and can be viewed as log-probability tables. The algorithm then performs the following updates iteratively:
The inventors of affinity propagation showed it is better for certain computer vision and computational biology tasks, e.g. clustering of pictures of human faces and identifying regulated transcripts, than Template:Mvar-means, even when Template:Mvar-means was allowed many random restarts and initialized using PCA. A study comparing AP and Markov clustering on protein interaction graph partitioning found Markov clustering to work better for that problem.
- A Java implementation is included in the ELKI data mining framework.
- A Julia implementation of affinity propagation is contained in Julia Statistics's Clustering.jl package.
- A Python version is part of the scikit-learn library.
- An R implemenation is available in the "apcluster" package.