Chemostat: Difference between revisions

Latest revision as of 14:50, 27 October 2014

Nice to satisfy you, my name is Figures Held though I don't truly like being called like that. Her family members life in Minnesota. The preferred pastime for my kids and me is to play baseball but I haven't made a dime with it. My day job is a meter reader.

My web site ... ninfeta.tv

@@ Line 1: / Line 1: @@
-The '''Rand index'''<ref name=rand71>{{Cite journal
+Nice to satisfy you, my name is Figures Held though I don't truly like being called like that. Her family members life in Minnesota. The preferred pastime for my kids and me is to play baseball but I haven't made a dime with it. My day job is a meter reader.<br><br>My web site ... [http://www.ninfeta.tv/blog/99493 ninfeta.tv]
- | author = W. M. Rand
- | title = Objective criteria for the evaluation of clustering methods
- | journal = [[Journal of the American Statistical Association]]
- | volume = 66
- | pages = 846–850
- | year = 1971
- | doi = 10.2307/2284239
- | issue = 336
- | publisher = American Statistical Association
- | jstor = 2284239
-}}</ref> or '''Rand measure''' (named after William M. Rand) in [[statistics]], and in particular in [[data clustering]], is a measure of the similarity between two [[data clustering]]s. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the '''adjusted Rand index'''. From a mathematical standpoint, Rand index is related to the [[Accuracy and precision#In binary classification|accuracy]], but is applicable even when class labels are not used.
-==Rand index==
-===Definition===
-Given a [[Set (mathematics)|set]] of <math>n</math> [[element (mathematics)|elements]] <math>S = \{o_1, \ldots, o_n\}</math> and two [[Partition of a set|partitions]] of <math>S</math> to compare, <math>X = \{X_1, \ldots, X_r\}</math>, a partition of ''S'' into ''r'' subsets, and <math>Y = \{Y_1, \ldots, Y_s\}</math>, a partition of ''S'' into ''s'' subsets, define the following:
-* <math>a</math>, the number of pairs of elements in <math>S</math> that are in the same set in <math>X</math> and in the same set in <math>Y</math>
-* <math>b</math>, the number of pairs of elements in <math>S</math> that are in different sets in <math>X</math> and in different sets in <math>Y</math>
-* <math>c</math>, the number of pairs of elements in <math>S</math> that are in the same set in <math>X</math> and in different sets in <math>Y</math>
-* <math>d</math>, the number of pairs of elements in <math>S</math> that are in different sets in <math>X</math> and in the same set in <math>Y</math>
-The Rand index, <math>R</math>, is:<ref name=rand71/><ref name=hb85>{{Cite journal
- | doi = 10.1007/BF01908075
- | author = Lawrence Hubert and Phipps Arabie
- | title = Comparing partitions
- | journal = Journal of Classification
- | volume = 2
- |issue=1
- | pages = 193–218
- | year = 1985
-}}</ref>
-:<math> R = \frac{a+b}{a+b+c+d} = \frac{a+b}{{n \choose 2 }}</math>
-Intuitively,  <math>a + b</math> can be considered as the number of agreements between <math>X</math> and <math>Y</math> and <math>c + d</math> as the number of disagreements between <math>X</math> and <math>Y</math>.
-===Properties===
-The Rand index has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.
-In mathematical terms, a, b, c, d are defined as follows:
-*<math>a = |S^{*}|</math>, where <math>S^{*} = \{ (o_{i}, o_{j}) | o_{i}, o_{j} \in X_{k}, o_{i}, o_{j} \in Y_{l}\}</math>
-*<math>b = |S^{*}|</math>, where <math>S^{*} = \{ (o_{i}, o_{j}) | o_{i} \in X_{k_{1}}, o_{j} \in X_{k_{2}}, o_{i} \in Y_{l_{1}}, o_{j} \in Y_{l_{2}}\}</math>
-*<math>c = |S^{*}|</math>, where <math>S^{*} = \{ (o_{i}, o_{j}) | o_{i}, o_{j} \in X_{k}, o_{i} \in Y_{l_{1}}, o_{j} \in Y_{l_{2}}\}</math>
-*<math>d = |S^{*}|</math>, where <math>S^{*} = \{ (o_{i}, o_{j}) | o_{i} \in X_{k_{1}}, o_{j} \in X_{k_{2}}, o_{i}, o_{j} \in Y_{l}\}</math>
-for some <math>1 \leq i,j \leq n, i \neq j, 1 \leq k, k_{1}, k_{2} \leq r, k_{1} \neq k_{2}, 1 \leq l, l_{1},l_{2} \leq s, l_{1} \neq l_{2}</math>
-==Adjusted Rand index==
-The adjusted Rand index is the corrected-for-chance version of the Rand index.<ref name=rand71/><ref name=hb85/><ref>{{Cite conference
- | author = Nguyen Xuan Vinh, Julien Epps and James Bailey
- | title = Information Theoretic Measures for Clustering Comparison: Is a Correction for Chance Necessary?
- | booktitle = ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
- | year = 2009
- | pages = 1073–1080
- | URL=http://www.ima.umn.edu/~iwen/REU/10.pdf PDF.
- | publisher = ACM
-}}[http://www.ima.umn.edu/~iwen/REU/10.pdf PDF].
-</ref> Though the Rand Index may only yield a value between 0 and +1, the Adjusted Rand Index can yield negative values if the index is less than the expected index.<ref>http://i11www.iti.uni-karlsruhe.de/extra/publications/ww-cco-06.pdf</ref>
-===The contingency table===
-Given a set <math>S</math> of <math>n</math> elements, and two groupings (''e.g.'' clusterings) of these points, namely <math>X = \{ X_1, X_2, \ldots , X_r \}</math> and <math>Y = \{ Y_1, Y_2, \ldots , Y_s \}</math>, the overlap between <math>X</math> and <math>Y</math> can be summarized in a contingency table <math>\left[n_{ij}\right]</math> where each entry <math>n_{ij}</math> denotes the number of objects in common between <math>X_i</math> and <math>Y_j</math> : <math>n_{ij}=|X_i \cap Y_j|</math>.
-{| border="0" cellpadding="2" align="center"
-|-align="center"  valign="center"
-! style="border-bottom:1px solid black;border-right:1px solid black;" | X\Y
-! style="border-bottom:1px solid black;" | <math>Y_1</math>
-! style="border-bottom:1px solid black;" | <math>Y_2</math>
-! style="border-bottom:1px solid black;" | <math>\ldots</math>
-! style="border-bottom:1px solid black;" | <math>Y_s</math>
-! style="border-bottom:1px solid black;border-left:1px solid black;" | Sums
-|-align="center"  valign="center"
-! style="border-right:1px solid black;" | <math>X_1</math>
-| <math>n_{11}</math>
-| <math>n_{12}</math>
-| <math>\ldots</math>
-| <math>n_{1s}</math>
-| style="border-left:1px solid black;" | <math>a_1</math>
-|-align="center"  valign="center"
-! style="border-right:1px solid black;" | <math>X_2</math>
-| <math>n_{21}</math>
-| <math>n_{22}</math>
-| <math>\ldots</math>
-| <math>n_{2s}</math>
-| style="border-left:1px solid black;" | <math>a_2</math>
-|-align="center"  valign="center"
-| style="border-right:1px solid black;" | <math>\vdots</math>
-| <math>\vdots</math>
-| <math>\vdots</math>
-| <math>\ddots</math>
-| <math>\vdots</math>
-| style="border-left:1px solid black;" | <math>\vdots</math>
-|-align="center"  valign="center"
-! style="border-right:1px solid black;" | <math>X_r</math>
-| <math>n_{r1}</math>
-| <math>n_{r2}</math>
-| <math>\ldots</math>
-| <math>n_{rs}</math>
-| style="border-left:1px solid black;" | <math>a_r</math>
-|-align="center"  valign="center"
-! style="border-right:1px solid black;border-top:1px solid black;" | Sums
-| style="border-top:1px solid black;" | <math>b_1</math>
-| style="border-top:1px solid black;" | <math>b_2</math>
-| style="border-top:1px solid black;" | <math>\ldots</math>
-| style="border-top:1px solid black;" | <math>b_s</math>
-| style="border-left:1px solid black;border-top:1px solid black;" |
-|-
-|}
-===Definition===
-The adjusted form of the Rand Index, the Adjusted Rand Index, is <math>AdjustedIndex = \frac{Index - ExpectedIndex}{MaxIndex - ExpectedIndex}</math>, more specifically<br />
-<math>ARI = \frac{ \sum_{ij} \binom{n_{ij}}{2} - [\sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2}] / \binom{n}{2} }{ \frac{1}{2} [\sum_i \binom{a_i}{2} + \sum_j \binom{b_j}{2}] - [\sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2}] / \binom{n}{2} }</math>
-<br />where <math>n_{ij}, a_i, b_j</math> are values from the contingency table.
-==References==
-{{Reflist}}
-== External links ==
-* [https://github.com/bjoern-andres/partition-comparison C++ implementation with MATLAB mex files]
-[[Category:Machine learning]]
-[[Category:Summary statistics for contingency tables]]
-[[Category:Clustering criteria]]

Chemostat: Difference between revisions

Latest revision as of 14:50, 27 October 2014

Navigation menu

Search