De Bruijn index: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
→‎Alternatives to De Bruijn indices: fix broken external link
en>Monkbot
 
Line 1: Line 1:
{{distinguish|Median (geometry)}}
I would like to introduce myself to you, I am Andrew and my wife doesn't like it at all. Her family life in Ohio but her husband wants them to move. Credit authorising is exactly where my primary income arrives from. She is truly fond of caving but she doesn't have the time lately.<br><br>Here is my weblog ... [http://si.dgmensa.org/xe/index.php?document_srl=48014&mid=c0102 psychic chat online]
 
The '''geometric median''' of a discrete set of sample points in a [[Euclidean space]] is the point minimizing the sum of distances to the sample points. This generalizes the [[median]], which has the property of minimizing the sum of distances for one-dimensional data, and provides a [[central tendency]] in higher dimensions. It is also known as the '''1-median'''.<ref>The more general [[k-median problem|''k''-median problem]] asks for the location of ''k'' cluster centers minimizing the sum of distances from each sample point to its nearest center.</ref>
 
The geometric median is an important [[estimator]] of [[location parameter|location]] in statistics. It is also a standard problem in [[facility location]], where it models the problem of locating a facility to minimize the cost of transportation.
 
The special case of the problem for three points in the plane (that is, ''m'' = 3 and ''n'' = 2 in the definition below) is sometimes also known as Fermat's problem; it arises in the construction of minimal [[Steiner tree]]s, and was originally posed as a problem by [[Pierre de Fermat]] to [[Evangelista Torricelli]], who solved it. Its solution is now known as the [[Fermat point]] of the triangle formed by the three sample points. The geometric median may in turn be generalized to the problem of minimizing the sum of ''weighted'' distances, known as the [[Weber problem]] after [[Alfred Weber]]'s discussion of the problem in his 1909 book on facility location. Some sources instead call Weber's problem the Fermat–Weber problem, but others use this name for the unweighted geometric median problem.
 
Wesolowsky (1993) provides a survey of the geometric median problem. See Fekete, Mitchell, and Beurer (2003) for generalizations of the problem to non-discrete point sets.
 
==Definition==
 
Formally, for a given set of ''m'' points <math>x_1, x_2, \dots, x_m\,</math> with each <math>x_i \in \mathbb{R}^n</math>, the geometric median is defined as
 
:Geometric Median <math>=\underset{y \in \mathbb{R}^n}{\operatorname{arg\,min}} \sum_{i=1}^m \left \| x_i-y \right \|_2</math>
 
Note that ''argmin'' means the value of the argument <math>y</math> which minimizes the sum. In this case, it is the point <math>y</math> from where the sum of all [[Euclidean distance]]s to the <math>x_i</math>'s is minimum.
 
==Properties==
* For the 1-dimensional case, the geometric median coincides with the [[median]]. This is because the [[univariate]] median also minimizes the sum of distances from the points.
* The geometric median is '''unique''' whenever the points are not [[Line (geometry)#Collinear_points|collinear]].
* The geometric median is [[equivariant]] for Euclidean [[Similarity (geometry)|similarity transformations]], including [[translation (geometry)|translation]] and [[rotation (mathematics)|rotation]]. This means that one would get the same result either by transforming the geometric median, or by applying the same transformation to the sample data and finding the geometric median of the transformed data. This property follows from the fact that the geometric median is defined only from pairwise distances, and doesn't depend on the system of orthogonal [[Cartesian coordinates]] by which the sample data is represented. In contrast, the component-wise median for a multivariate data set is not in general rotation invariant, nor is it independent of the choice of coordinates.
* The geometric median has a [[breakdown point]] of 0.5.<ref>Lopuhaä and Rousseeuw (1991).</ref> That is, up to half of the sample data may be arbitrarily corrupted, and the median of the samples will still provide a [[robust estimator]] for the location of the uncorrupted data.
 
==Special cases==
*'''For 3 (non-[[collinear]]) points,''' if any angle of the triangle formed by those points is 120° or more, then the geometric median is the point making that angle. If all the angles are less than 120°, the geometric median is the point inside the triangle which subtends an angle of 120° to each three pairs of triangle vertices. This is also known as the [[Fermat point]] of the triangle formed by the three vertices. (If the three points are collinear then the geometric median is the point between the two other points, as is the case with a one-dimensional median.)
*'''For 4 [[coplanar]] points,''' if one of the four points is inside the triangle formed by the other three points, then the geometric median is that point. Otherwise, the four points form a convex [[quadrilateral]] and the geometric median is the crossing point of the diagonals of the quadrilateral. The geometric median of four coplanar points is the same as the unique [[Radon point]] of the four points.
 
==Computation==
Despite the geometric median's being an easy-to-understand concept, computing it poses a challenge. The [[centroid]] or [[center of mass]], defined similarly to the geometric median as minimizing the sum of the ''squares'' of the distances to each point, can be found by a simple formula — its coordinates are the averages of the coordinates of the points — but no such formula is known for the geometric median, and it has been shown that no [[Closed-form expression|explicit formula]], nor an exact algorithm involving only arithmetic operations and ''k''th roots can exist in general. Therefore only numerical or symbolic approximations to the solution of this problem are possible under this [[model of computation]].<ref>Bajaj (1986), Bajaj (1988). Earlier, Cockayne and Melzak (1969) proved that the Steiner point for 5 points in the plane cannot be constructed with [[ruler and compass]]</ref>
 
However, it is straightforward to calculate an approximation to the geometric median using an iterative procedure in which each step produces a more accurate approximation. Procedures of this type can be derived from the fact that the sum of distances to the sample points is a [[convex function]], since the distance to each sample point is convex and the sum of convex functions remains convex. Therefore, procedures that decrease the sum of distances at each step cannot get trapped in a [[local optimum]].
 
One common approach of this type, called '''Weiszfeld's algorithm''' after the work of [[Endre Weiszfeld]],<ref>Weiszfeld (1937); Kuhn (1973); Chandrasekaran and Tamir (1989).</ref> is a form of [[iteratively re-weighted least squares]]. This algorithm defines a set of weights that are inversely proportional to the distances from the current estimate to the samples, and creates a new estimate that is the weighted average of the samples according to these weights. That is,
:<math>\left. y_{i+1}=\left( \sum_{j=1}^m \frac{x_j}{\| x_j - y_i \|} \right) \right/ \left( \sum_{j=1}^m \frac{1}{\| x_j - y_i \|} \right).</math>
 
Bose et al. (2003) describe more sophisticated geometric optimization procedures for finding approximately optimal solutions to this problem. As {{harvtxt|Nie|Parrilo|Sturmfels|2008}} show, the problem can also be represented as a [[semidefinite programming|semidefinite program]].
 
==Characterization of the geometric median==
If ''y'' is distinct from all the given points, ''x''<sub>''j''</sub>, then ''y'' is the geometric median if and only if it satisfies:
:<math>0 = \sum_{j=1}^m \frac {x_j - y} {\left \| x_j - y \right \|}.</math>
 
This is equivalent to:
:<math>\left. y = \left( \sum_{j=1}^m \frac{x_j}{\| x_j - y \|} \right) \right/ \left( \sum_{j=1}^m \frac{1}{\| x_j - y \|} \right),</math>
 
which is closely related to Weiszfeld's algorithm.
 
In general,  ''y'' is the geometric median if and only if there are vectors ''u''<sub>''j''</sub> such that:
:<math>0 =  \sum_{j=1}^m u_j </math>
where for ''x''<sub>''j''</sub> ≠ ''y'',
:<math>u_j = \frac {x_j - y} {\left \| x_j - y \right \|}</math>
and for ''x''<sub>''j''</sub> = ''y'',
:<math>\| u_j \| \leq 1 .</math>
An equivalent formulation of this condition is
:<math>\sum _{1\le j\le m, x_j\ne y}
\frac {x_j - y} {\left \| x_j - y \right \|} \le \left|\{
\,j\mid 1\le j\le m, x_j= y\,\}\right|.</math>
 
== Generalizations ==
The geometric median can be generalized from Euclidean spaces to general [[Riemannian manifold]]s (and even [[metric space]]s) using the same idea which is used to define the [[Fréchet mean]] on a Riemannian manifold. Let <math>M</math> be a Riemannian manifold with corresponding distance function <math>d(\cdot, \cdot)</math>, let <math>w_1, \ldots, w_n</math> be <math>n</math> weights summing to 1, and let <math>x_1, \ldots, x_n</math>
be <math>n</math> observations from <math>M</math>.  Then we define the  weighted geometric median <math>m</math> (or weighted Fréchet median) of the data points as
: <math> m = \underset{x \in M}{\operatorname{arg\,min}} \sum_{i=1}^n w_i d(x,x_i) </math>.
If all the weights are equal, we say simply that <math>m</math> is the geometric median.
 
== Notes ==
<references/>
 
== References ==
*{{cite journal
| author = Bajaj, C.
| title = Proving geometric algorithms nonsolvability: An application of factoring polynomials
| journal = [[Journal of Symbolic Computation]]
| year = 1986
| volume = 2
| pages = 99–102
| doi =  10.1016/S0747-7171(86)80015-3
| ref = harv 
}}
*{{cite journal
| author = Bajaj, C.
| title = The algebraic degree of geometric optimization problems
| journal = [[Discrete and Computational Geometry]]
| year = 1988
| volume = 3
| pages = 177–191
| doi = 10.1007/BF02187906
| ref = harv}}
*{{cite journal
| title = Fast approximations for sums of distances, clustering and the Fermat–Weber problem
| author = Bose, Prosenjit; Maheshwari, Anil; Morin, Pat
| journal = Computational Geometry: Theory and Applications
| volume = 24
| issue = 3
| pages = 135–146
| year = 2003
| doi = 10.1016/S0925-7721(02)00102-5
| url = http://www.scs.carleton.ca/~jit/publications/papers/bmm01.ps
| ref = harv}}
*{{cite journal
| author = Chandrasekaran, R.; Tamir, A.
| title = Open questions concerning Weiszfeld's algorithm for the Fermat-Weber location problem
| journal = Mathematical Programming, Series A
| volume = 44
| year = 1989
| pages = 293–295
| doi = 10.1007/BF01587094
| ref = harv}}
*{{cite journal
| doi = 10.2307/2688541
| author = Cockayne, E. J.; Melzak, Z. A.
| title = Euclidean constructability in graph minimization problems
| jstor = 2688541
| journal = Mathematics Magazine
| volume = 42
| issue = 4
| pages = 206–208
| year = 1969
| ref = harv}}
*{{cite arxiv
| author = Fekete, Sándor P.; Mitchell, Joseph S. B.; Beurer, Karin
| title = On the continuous Fermat-Weber problem
| year = 2003
| ref = harv
| eprint=cs.CG/0310027
| class = cs.CG}}
*{{cite journal
| first1 = P. Thomas | last1 = Fletcher | first2 = Suresh | last2 = Venkatasubramanian | first3 = Sarang | last3 = Joshi
| title  = The Geometric Median on Riemannian Manifolds with Application to Robust Atlas Estimation
| journal = Neuroimage
| volume = 45
| year = 2009
| pages = s143–s152
| doi = 10.1016/j.neuroimage.2008.10.052
| pmid = 19056498
| issue = 1 Suppl
| pmc = 2735114
| ref = harv}}
*{{cite journal
| author = Kuhn, Harold W.
| title = A note on Fermat's problem
| journal = Mathematical Programming
| year = 1973
| volume = 4
| issue = 1
| pages = 98–107
| doi = 10.1007/BF01584648
| ref = harv}}
*{{cite journal
| author = Lopuhaä, Hendrick P.; [[Peter Rousseeuw|Rousseeuw, Peter J.]]
| title = Breakdown points of affine equivariant estimators of multivariate location and covariance matrices
| year = 1991
| journal = Annals of Statistics
| volume = 19
| pages = 229–248
| issue = 1
| doi = 10.1214/aos/1176347978
| ref = harv
| jstor=2241852}}
*{{Cite book
| first1 = Jiawang | last1 = Nie | first2 = Pablo A. |last2 = Parrilo | first3 = Bernd | last3 = Sturmfels | author3-link = Bernd Sturmfels
| contribution = Semidefinite representation of the ''k''-ellipse
| series = IMA Volumes in Mathematics and its Applications
| volume = 146
| editor1-first = A. | editor1-last = Dickenstein
| editor2-first = F.-O. | editor2-last = Schreyer
| editor3-first = A.J. | editor3-last = Sommese
| publisher = Springer-Verlag | pages = 117–132 | year = 2008 | arxiv = math/0702005
| title = Algorithms in Algebraic Geometry
| ref = harv}}
*{{cite journal
| author = Ostresh, L.
| title = Convergence of a Class of Iterative Methods for Solving Weber Location Problem
| year = 1978
| journal = Operations Research
| volume = 26
| pages = 597–609
| doi = 10.1287/opre.26.4.597
| ref = harv
| issue = 4}}
*{{cite book
| author = Weber, Alfred
| title = Über den Standort der Industrien, Erster Teil: Reine Theorie des Standortes
| location = Tübingen
| publisher = Mohr
| year = 1909}}
*{{cite journal
| author = Wesolowsky, G.
| title = The Weber problem: History and perspective
| journal = Location Science
| volume = 1
| pages = 5–23
| year = 1993
| ref = harv}}
*{{cite journal
| last = Weiszfeld | first = E. | authorlink = Andrew Vázsonyi
| title = Sur le point pour lequel la somme des distances de ''n'' points donnes est minimum
| journal = [[Tohoku Mathematical Journal]]
| volume = 43
| year = 1937
| pages = 355–386
| ref = harv}}
 
[[Category:Means]]
[[Category:Multivariate statistics]]
[[Category:Non-parametric statistics]]
[[Category:Mathematical optimization]]
[[Category:Operations research]]
[[Category:Geometric algorithms]]

Latest revision as of 17:41, 7 May 2014

I would like to introduce myself to you, I am Andrew and my wife doesn't like it at all. Her family life in Ohio but her husband wants them to move. Credit authorising is exactly where my primary income arrives from. She is truly fond of caving but she doesn't have the time lately.

Here is my weblog ... psychic chat online