Chemostat: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Monkbot
 
Line 1: Line 1:
The '''Rand index'''<ref name=rand71>{{Cite journal
Nice to satisfy you, my name is Figures Held though I don't truly like being called like that. Her family members life in Minnesota. The preferred pastime for my kids and me is to play baseball but I haven't made a dime with it. My day job is a meter reader.<br><br>My web site ... [http://www.ninfeta.tv/blog/99493 ninfeta.tv]
| author = W. M. Rand
| title = Objective criteria for the evaluation of clustering methods
| journal = [[Journal of the American Statistical Association]]
| volume = 66
| pages = 846–850
| year = 1971
| doi = 10.2307/2284239
| issue = 336
| publisher = American Statistical Association
| jstor = 2284239
}}</ref> or '''Rand measure''' (named after William M. Rand) in [[statistics]], and in particular in [[data clustering]], is a measure of the similarity between two [[data clustering]]s. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the '''adjusted Rand index'''. From a mathematical standpoint, Rand index is related to the [[Accuracy and precision#In binary classification|accuracy]], but is applicable even when class labels are not used.
 
==Rand index==
===Definition===
Given a [[Set (mathematics)|set]] of <math>n</math> [[element (mathematics)|elements]] <math>S = \{o_1, \ldots, o_n\}</math> and two [[Partition of a set|partitions]] of <math>S</math> to compare, <math>X = \{X_1, \ldots, X_r\}</math>, a partition of ''S'' into ''r'' subsets, and <math>Y = \{Y_1, \ldots, Y_s\}</math>, a partition of ''S'' into ''s'' subsets, define the following:
* <math>a</math>, the number of pairs of elements in <math>S</math> that are in the same set in <math>X</math> and in the same set in <math>Y</math>
 
* <math>b</math>, the number of pairs of elements in <math>S</math> that are in different sets in <math>X</math> and in different sets in <math>Y</math>
 
* <math>c</math>, the number of pairs of elements in <math>S</math> that are in the same set in <math>X</math> and in different sets in <math>Y</math>
 
* <math>d</math>, the number of pairs of elements in <math>S</math> that are in different sets in <math>X</math> and in the same set in <math>Y</math>
 
The Rand index, <math>R</math>, is:<ref name=rand71/><ref name=hb85>{{Cite journal
| doi = 10.1007/BF01908075
| author = Lawrence Hubert and Phipps Arabie
| title = Comparing partitions
| journal = Journal of Classification
| volume = 2
|issue=1
| pages = 193–218
| year = 1985
}}</ref>
:<math> R = \frac{a+b}{a+b+c+d} = \frac{a+b}{{n \choose 2 }}</math>
Intuitively,  <math>a + b</math> can be considered as the number of agreements between <math>X</math> and <math>Y</math> and <math>c + d</math> as the number of disagreements between <math>X</math> and <math>Y</math>.
 
===Properties===
The Rand index has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.
 
In mathematical terms, a, b, c, d are defined as follows:
 
*<math>a = |S^{*}|</math>, where <math>S^{*} = \{ (o_{i}, o_{j}) | o_{i}, o_{j} \in X_{k}, o_{i}, o_{j} \in Y_{l}\}</math>
 
*<math>b = |S^{*}|</math>, where <math>S^{*} = \{ (o_{i}, o_{j}) | o_{i} \in X_{k_{1}}, o_{j} \in X_{k_{2}}, o_{i} \in Y_{l_{1}}, o_{j} \in Y_{l_{2}}\}</math>
 
*<math>c = |S^{*}|</math>, where <math>S^{*} = \{ (o_{i}, o_{j}) | o_{i}, o_{j} \in X_{k}, o_{i} \in Y_{l_{1}}, o_{j} \in Y_{l_{2}}\}</math>
 
*<math>d = |S^{*}|</math>, where <math>S^{*} = \{ (o_{i}, o_{j}) | o_{i} \in X_{k_{1}}, o_{j} \in X_{k_{2}}, o_{i}, o_{j} \in Y_{l}\}</math>
 
for some <math>1 \leq i,j \leq n, i \neq j, 1 \leq k, k_{1}, k_{2} \leq r, k_{1} \neq k_{2}, 1 \leq l, l_{1},l_{2} \leq s, l_{1} \neq l_{2}</math>
 
==Adjusted Rand index==
The adjusted Rand index is the corrected-for-chance version of the Rand index.<ref name=rand71/><ref name=hb85/><ref>{{Cite conference
| author = Nguyen Xuan Vinh, Julien Epps and James Bailey
| title = Information Theoretic Measures for Clustering Comparison: Is a Correction for Chance Necessary?
| booktitle = ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
| year = 2009
| pages = 1073–1080
| URL=http://www.ima.umn.edu/~iwen/REU/10.pdf PDF.
| publisher = ACM
}}[http://www.ima.umn.edu/~iwen/REU/10.pdf PDF].
</ref> Though the Rand Index may only yield a value between 0 and +1, the Adjusted Rand Index can yield negative values if the index is less than the expected index.<ref>http://i11www.iti.uni-karlsruhe.de/extra/publications/ww-cco-06.pdf</ref>
 
===The contingency table===
Given a set <math>S</math> of <math>n</math> elements, and two groupings (''e.g.'' clusterings) of these points, namely <math>X = \{ X_1, X_2, \ldots , X_r \}</math> and <math>Y = \{ Y_1, Y_2, \ldots , Y_s \}</math>, the overlap between <math>X</math> and <math>Y</math> can be summarized in a contingency table <math>\left[n_{ij}\right]</math> where each entry <math>n_{ij}</math> denotes the number of objects in common between <math>X_i</math> and <math>Y_j</math> : <math>n_{ij}=|X_i \cap Y_j|</math>.
{| border="0" cellpadding="2" align="center"
|-align="center"  valign="center"
! style="border-bottom:1px solid black;border-right:1px solid black;" | X\Y
! style="border-bottom:1px solid black;" | <math>Y_1</math>
! style="border-bottom:1px solid black;" | <math>Y_2</math>
! style="border-bottom:1px solid black;" | <math>\ldots</math>
! style="border-bottom:1px solid black;" | <math>Y_s</math>
! style="border-bottom:1px solid black;border-left:1px solid black;" | Sums
|-align="center"  valign="center"
! style="border-right:1px solid black;" | <math>X_1</math>
| <math>n_{11}</math>
| <math>n_{12}</math>
| <math>\ldots</math>
| <math>n_{1s}</math>
| style="border-left:1px solid black;" | <math>a_1</math>
|-align="center"  valign="center"
! style="border-right:1px solid black;" | <math>X_2</math>
| <math>n_{21}</math>
| <math>n_{22}</math>
| <math>\ldots</math>
| <math>n_{2s}</math>
| style="border-left:1px solid black;" | <math>a_2</math>
|-align="center"  valign="center"
| style="border-right:1px solid black;" | <math>\vdots</math>
| <math>\vdots</math>
| <math>\vdots</math>
| <math>\ddots</math>
| <math>\vdots</math>
| style="border-left:1px solid black;" | <math>\vdots</math>
|-align="center"  valign="center"
! style="border-right:1px solid black;" | <math>X_r</math>
| <math>n_{r1}</math>
| <math>n_{r2}</math>
| <math>\ldots</math>
| <math>n_{rs}</math>
| style="border-left:1px solid black;" | <math>a_r</math>
|-align="center"  valign="center"
! style="border-right:1px solid black;border-top:1px solid black;" | Sums
| style="border-top:1px solid black;" | <math>b_1</math>
| style="border-top:1px solid black;" | <math>b_2</math>
| style="border-top:1px solid black;" | <math>\ldots</math>
| style="border-top:1px solid black;" | <math>b_s</math>
| style="border-left:1px solid black;border-top:1px solid black;" |
|-
|}
 
===Definition===
The adjusted form of the Rand Index, the Adjusted Rand Index, is <math>AdjustedIndex = \frac{Index - ExpectedIndex}{MaxIndex - ExpectedIndex}</math>, more specifically<br />
<math>ARI = \frac{ \sum_{ij} \binom{n_{ij}}{2} - [\sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2}] / \binom{n}{2} }{ \frac{1}{2} [\sum_i \binom{a_i}{2} + \sum_j \binom{b_j}{2}] - [\sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2}] / \binom{n}{2} }</math>
<br />where <math>n_{ij}, a_i, b_j</math> are values from the contingency table.
 
==References==
{{Reflist}}
 
== External links ==
* [https://github.com/bjoern-andres/partition-comparison C++ implementation with MATLAB mex files]
 
[[Category:Machine learning]]
[[Category:Summary statistics for contingency tables]]
[[Category:Clustering criteria]]

Latest revision as of 14:50, 27 October 2014

Nice to satisfy you, my name is Figures Held though I don't truly like being called like that. Her family members life in Minnesota. The preferred pastime for my kids and me is to play baseball but I haven't made a dime with it. My day job is a meter reader.

My web site ... ninfeta.tv