Affinity propagation

From formulasearchengine
Jump to navigation Jump to search

In statistics and data mining, affinity propagation (AP) is a clustering algorithm based on the concept of "message passing" between data points.[1] Unlike clustering algorithms such as [[K-means clustering|Template:Mvar-means]] or [[K-medoids|Template:Mvar-medoids]], AP does not require the number of clusters to be determined or estimated before running the algorithm. Like Template:Mvar-medoids, AP finds "exemplars", members of the input set that are representative of clusters.[1]

Algorithm

Let x1 through Template:Mvar be a set of data points, with no assumptions made about their internal structure, and let Template:Mvar be a function that quantifies the similarity between any two points, such that s(xi, xj) > s(xi, xk) iff Template:Mvar is more similar to Template:Mvar than to Template:Mvar.

The algorithm proceeds by alternating two message passing steps, to update two matrices:[1]

  • The "responsibility" matrix R has values r(i, k) that quantify how well-suited Template:Mvar is to serve as the exemplar for Template:Mvar, relative to other candidate exemplars for Template:Mvar.
  • The "availability" matrix A contains values a(i, k) represents how "appropriate" it would be for Template:Mvar to pick Template:Mvar as its exemplar, taking into account other points' preference for Template:Mvar as an exemplar.

Both matrices are initialized to all zeroes, and can be viewed as log-probability tables. The algorithm then performs the following updates iteratively:

for and
.

Applications

The inventors of affinity propagation showed it is better for certain computer vision and computational biology tasks, e.g. clustering of pictures of human faces and identifying regulated transcripts, than Template:Mvar-means,[1] even when Template:Mvar-means was allowed many random restarts and initialized using PCA.[2] A study comparing AP and Markov clustering on protein interaction graph partitioning found Markov clustering to work better for that problem.[3]

Software

  • A Java implementation is included in the ELKI data mining framework.
  • A Julia implementation of affinity propagation is contained in Julia Statistics's Clustering.jl package.[4]
  • A Python version is part of the scikit-learn library.[5]
  • An R implemenation is available in the "apcluster" package.[6]

References

  1. 1.0 1.1 1.2 1.3 {{#invoke:Citation/CS1|citation |CitationClass=journal }}
  2. {{#invoke:citation/CS1|citation |CitationClass=conference }}
  3. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
  4. Clustering.jl www.github.com
  5. Template:Cite web
  6. apcluster cran.r-project.org>

Template:Asbox