|
|
Line 1: |
Line 1: |
| '''Stress majorization''' is an [[optimization (mathematics)|optimization strategy]] used in [[multidimensional scaling]] (MDS) where, for a set of ''n'' ''m''-dimensional data items, a configuration ''X'' of ''n'' points in ''r(<<m)''-dimensional space is sought that minimizes the so called ''stress'' function <math>\sigma(X)</math>. Usually ''r'' is 2 or 3, i.e. the ''(r'' x ''n)'' matrix ''X'' lists points in 2- or 3-dimensional [[Euclidean space]] so that the result may be visualised (i.e. an MDS plot). The function <math>\sigma</math> is a cost or [[loss function]] that measures the squared differences between ideal (<math>m</math>-dimensional) distances and actual distances in ''r''-dimensional space. It is defined as:
| | The writer is called Araceli Gulledge. He presently lives in Idaho and his mothers and fathers live nearby. Bookkeeping is what she does. One of the things I love most is climbing and now I have time to take on new issues.<br><br>Here is my blog post ... extended auto warranty - [http://Www.dubaitoursrus.com/UserProfile/tabid/61/userId/122235/Default.aspx mouse click the following web site] - |
| | |
| : <math>\sigma(X)=\sum_{i<j\le n}w_{ij}(d_{ij}(X)-\delta_{ij})^2</math>
| |
| | |
| where <math>w_{ij}\ge 0</math> is a weight for the measurement between a pair of points <math>(i,j)</math>, <math>d_{ij}(X)</math> is the [[euclidean distance]] between <math>i</math> and <math>j</math> and <math>\delta_{ij}</math> is the ideal distance between the points (their separation) in the <math>m</math>-dimensional data space. Note that <math>w_{ij}</math> can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).
| |
| | |
| A configuration <math>X</math> which minimizes <math>\sigma(X)</math> gives a plot in which points that are close together correspond to points that are also close together in the original <math>m</math>-dimensional data space.
| |
| | |
| There are many ways that <math> \sigma(X)</math> could be minimized. For example, Kruskal<ref>{{citation|last=Kruskal|first=J. B.|authorlink=Joseph Kruskal|title=Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis|journal=Psychometrika|volume=29|issue=1|pages=1–27|year=1964|doi=10.1007/BF02289565}}.</ref> recommended an iterative [[steepest descent]] approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by [[Jan de Leeuw]].<ref name="de Leeuw">{{citation|last=de Leeuw|first=J.|contribution=Applications of convex analysis to multidimensional scaling|editor1-first=J. R.|editor1-last=Barra|editor2-first=F.|editor2-last=Brodeau|editor3-first=G.|editor3-last=Romie|editor4-first=B.|editor4-last=van Cutsem|title=Recent developments in statistics|pages=133–145|year=1977}}.</ref> De Leeuw's ''iterative majorization'' method at each step minimizes a simple convex function which both bounds <math>\sigma</math> from above and touches the surface of <math>\sigma</math> at a point <math>Z</math>, called the ''supporting point''. In [[convex analysis]] such a function is called a ''majorizing'' function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by majorizing a convex function").
| |
| | |
| == The SMACOF algorithm ==
| |
| The stress function <math>\sigma</math> can be expanded as follows:
| |
| | |
| : <math>
| |
| \sigma(X)=\sum_{i<j\le n}w_{ij}(d_{ij}(X)-\delta_{ij})^2
| |
| =\sum_{i<j}w_{ij}\delta_{ij}^2 + \sum_{i<j}w_{ij}d_{ij}^2(X)-2\sum_{i<j}w_{ij}\delta_{ij}d_{ij}(X)
| |
| </math>
| |
| | |
| Note that the first term is a constant <math>C</math> and the second term is quadratic in X (i.e. for the [[Hessian matrix]] V the second term is equivalent to [[Matrix trace|tr]]<math>X'VX</math>) and therefore relatively easily solved. The third term is bounded by:
| |
| | |
| : <math> | |
| \sum_{i<j}w_{ij}\delta_{ij}d_{ij}(X)=\,\operatorname{tr}\, X'B(X)X \ge \,\operatorname{tr}\, X'B(Z)Z
| |
| </math>
| |
| | |
| where <math>B(Z)</math> has:
| |
| | |
| : <math>b_{ij}=-\frac{w_{ij}\delta_{ij}}{d_{ij}(Z)}</math> for <math>d_{ij}(Z)\ne 0, i \ne j</math>
| |
| | |
| and <math>b_{ij}=0</math> for <math>d_{ij}(Z)=0, i\ne j</math>
| |
| | |
| and <math>b_{ii}=-\sum_{j=1,j\ne i}^n b_{ij}</math>.
| |
| | |
| Proof of this inequality is by the [[Cauchy-Schwarz]] inequality, see Borg<ref name="borg">{{citation|last1=Borg|first1=I.|last2=Groenen|first2=P.|title=Modern Multidimensional Scaling: theory and applications|publisher=Springer-Verlag|location=New York|year=1997}}.</ref> (pp. 152–153).
| |
| | |
| Thus, we have a simple quadratic function <math>\tau(X,Z)</math> that majorizes stress:
| |
| | |
| : <math>\sigma(X)=C+\,\operatorname{tr}\, X'VX - 2 \,\operatorname{tr}\, X'B(X)X\le C+\,\operatorname{tr}\, X' V X - 2 \,\operatorname{tr}\, X'B(Z)Z = \tau(X,Z)
| |
| </math>
| |
| | |
| The iterative minimization procedure is then:
| |
| | |
| * at the k<sup>th</sup> step we set <math>Z\leftarrow X^{k-1}</math>
| |
| * <math>X^k\leftarrow \min_X \tau(X,Z)</math>
| |
| * stop if <math>\sigma(X^{k-1})-\sigma(X^{k})<\epsilon</math> otherwise repeat.
| |
| | |
| This algorithm has been shown to decrease stress monotonically (see de Leeuw<ref name="de Leeuw"/>).
| |
| | |
| == Use in graph drawing ==
| |
| Stress majorization and algorithms similar to SMACOF also have application in the field of [[graph drawing]].<ref>{{citation|last1=Michailidis|first1=G.|last2=de Leeuw|first2=J.|title=Data visualization through graph drawing|journal=Computation Stat.|year=2001|volume=16|issue=3|pages=435–450|doi=10.1007/s001800100077}}.</ref><ref>{{citation|first1=E.|last1=Gansner|first2=Y.|last2=Koren|first3=S.|last3=North|contribution=Graph Drawing by Stress Majorization|title=[[International Symposium on Graph Drawing|Proceedings of 12th Int. Symp. Graph Drawing (GD'04)]]|series=Lecture Notes in Computer Science|volume=3383|publisher=Springer-Verlag|pages=239–250|year=2004}}.</ref> That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the <math>\delta_{ij}</math> are usually set to the graph-theoretic distances between nodes ''i'' and ''j'' and the weights <math>w_{ij}</math> are taken to be <math>\delta_{ij}^{-\alpha}</math>. Here, <math>\alpha</math> is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for <math>\alpha=2</math>.<ref>{{citation|last=Cohen|first=J.|title=Drawing graphs to convey proximity: an incremental arrangement method|journal=ACM Transactions on Computer-Human Interaction|volume=4|issue=3|year=1997|pages=197–229|doi=10.1145/264645.264657}}.</ref>
| |
| | |
| == References ==
| |
| {{reflist}}
| |
| | |
| [[Category:Graph drawing]]
| |
| [[Category:Multivariate statistics]]
| |
| [[Category:Mathematical optimization]]
| |
| [[Category:Mathematical analysis]]
| |
The writer is called Araceli Gulledge. He presently lives in Idaho and his mothers and fathers live nearby. Bookkeeping is what she does. One of the things I love most is climbing and now I have time to take on new issues.
Here is my blog post ... extended auto warranty - mouse click the following web site -