Isolation lemma: Difference between revisions
en>Ylloh m references |
en>Shreevatsa m add ref asked for |
||
| Line 1: | Line 1: | ||
In [[anomaly detection]], the '''local outlier factor''' ('''LOF''') is an algorithm proposed by Markus M. Breunig, [[Hans-Peter Kriegel]], Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours.<ref>{{cite doi|10.1145/335191.335388}}</ref> | |||
LOF shares some concepts with [[DBSCAN]] and [[OPTICS algorithm|OPTICS]] such as the concepts of "core distance" and "reachability distance", which are used for local density estimation.<ref>{{cite doi | 10.1007/978-3-540-48247-5_28}}</ref> | |||
== Basic idea == | |||
[[File:LOF-idea.svg|thumb|right|250px|Basic idea of LOF: comparing the local density of a point with the densities of its neighbors. A has a much lower density than its neighbors.]] | |||
As indicated by the title, the local outlier factor is based on a concept of a local density, where locality is given by <math>k</math> nearest neighbors, whose distance is used to estimate the density. By comparing the local density of an object to the local densities of its neighbors, one can identify regions of similar density, and points that have a substantially lower density than their neighbors. These are considered to be [[outlier]]s. | |||
The local density is estimated by the typical distance at which a point can be "reached" from its neighbors. The definition of "reachability distance" used in LOF is an additional measure to produce more stable results within clusters. | |||
== Formal == | |||
Let <math>\mbox{k-distance}(A)</math> be the distance of the object <math>A</math> to the ''k'' nearest neighbor. Note that the set of the ''k'' nearest neighbors includes all objects at this distance, which can in the case of a "tie" be more than ''k'' objects. We denote the set of ''k'' nearest neighbors as <math>N_k(A)</math>. | |||
[[File:Reachability-distance.svg|thumb|right|250px|Illustration of the reachability distance. Objects ''B'' and ''C'' have the same reachability distance (k=3), while ''D'' is not a ''k'' nearest neighbor]] | |||
This distance is used to define what is called ''reachability distance'': | |||
<math>\mbox{reachability-distance}_k(A,B)=\max\{\mbox{k-distance}(B), d(A,B)\}</math> | |||
In words, the ''reachability distance'' of an object <math>A</math> ''from'' <math>B</math> is the true distance of the two objects, but at least the <math>\mbox{k-distance}</math> of <math>B</math>. Objects that belong to the ''k'' nearest neighbors of <math>B</math> (the "core" of <math>B</math>, see [[DBSCAN|DBSCAN cluster analysis]]) are considered to be equally distant. The reason for this distance is to get more stable results. Note that this is not a [[distance]] in the mathematical definition, since it is not symmetric. (While it is a common mistake<ref name="generalized" /> to always use the <math>\mbox{k-distance}</math>, this yields a slightly different method, referred to as Simplified-LOF<ref name="generalized" />) | |||
The ''local reachability density'' of an object <math>A</math> is defined by | |||
<math>\mbox{lrd}(A):=1/\left(\frac{\sum_{B\in N_k(A)}\mbox{reachability-distance}_k(A, B)}{|N_k(A)|}\right)</math> | |||
Which is the quotient of the average reachability distance of the object <math>A</math> ''from'' its neighbors. Note that it is not the average reachability of the neighbors from <math>A</math> (which by definition would be the <math>\mbox{k-distance}(A)</math>), but the distance at which it can be "reached" ''from'' its neighbors. With duplicate points, this value can become infinite. | |||
The local reachability densities are then compared with those of the neighbors using | |||
<math> | |||
\mbox{LOF}_k(A):=\frac{\sum_{B\in N_k(A)}\frac{\mbox{lrd}(B)}{\mbox{lrd}(A)}}{|N_k(A)|} | |||
= \frac{\sum_{B\in N_k(A)}\mbox{lrd}(B)}{|N_k(A)|} / \mbox{lrd}(A) | |||
</math> | |||
Which is the ''average local reachability density of the neighbors'' divided by the objects own local reachability density. A value of approximately <math>1</math> indicates that the object is comparable to its neighbors (and thus not an outlier). A value below <math>1</math> indicates a denser region (which would be an inlier), while values significantly larger than <math>1</math> indicate outliers. | |||
== Advantages == | |||
[[File:LOF.svg|thumb|right|400px|LOF scores as visualized by [[Environment for DeveLoping KDD-Applications Supported by Index-Structures|ELKI]]. While the upper right cluster has a comparable density to the outliers close to the bottom left cluster, they are detected correctly.]] | |||
Due to the local approach, LOF is able to identify outliers in a data set that would not be outliers in another area of the data set. For example, a point at a "small" distance to a very dense cluster is an outlier, while a point within a sparse cluster might exhibit similar distances to its neighbors. | |||
While the geometric intuition of LOF is only applicable to low dimensional vector spaces, the algorithm can be applied in any context a dissimilarity function can be defined. It has experimentally been shown to work very well in numerous setups, often outperforming the competitors, for example in [[Network intrusion detection system|network intrusion detection]].<ref>{{cite journal | title=A comparative study of anomaly detection schemes in network intrusion detection | year=2003 | authors=Ar Lazarevic, Aysel Ozgur, Levent Ertoz, Jaideep Srivastava, Vipin Kumar | journal=Proc. 3rd SIAM International Conference on Data Mining | url=http://www.siam.org/proceedings/datamining/2003/dm03_03LazarevicA.pdf | pages=25–36}}</ref> | |||
The LOF family of methods can be easily generalized and then applied to various other problems, such as detecting outliers in geographic data, video streams or authorship networks.<ref name="generalized" /> | |||
== Disadvantages and Extensions == | |||
The resulting values are [[quotient]]-values and hard to interpret. A value of 1 or even less indicates a clear inlier, but there is no clear rule for when a point is an outlier. In one data set, a value of 1.1 may already be an outlier, in another dataset and parameterization (with strong local fluctuations) a value of 2 could still be an inlier. These differences can also occur within a dataset due to the locality of the method. There exist extensions of LOF that try to improve over LOF in these aspects: | |||
* ''Feature Bagging for Outlier Detection'' <ref>{{cite doi|10.1145/1081870.1081891}}</ref> runs LOF on multiple projections and combines the results for improved detection qualities in high dimensions. | |||
* ''Local Outlier Probability'' (LoOP)<ref>{{cite doi|10.1145/1645953.1646195}}</ref> is a method derived from LOF but using inexpensive local statistics to become less sensitive to the choice of the parameter ''k''. In addition, the resulting values are scaled to a value range of <math>[0:1]</math>. | |||
* ''Interpreting and Unifying Outlier Scores'' <ref>{{cite journal | title=Interpreting and Unifying Outlier Scores | year=2011 | authors=[[Hans-Peter Kriegel]], Peer Kröger, Erich Schubert, Arthur Zimek | journal=Proc. 11th SIAM International Conference on Data Mining | url=http://siam.omnibooksonline.com/2011datamining/data/papers/018.pdf }}</ref> proposes a normalization of the LOF outlier scores to the interval <math>[0:1]</math> using statistical scaling to increase [[usability]] and can be seen a improved version of the LoOP ideas. | |||
* ''On Evaluation of Outlier Rankings and Outlier Scores'' <ref>{{cite journal | title=On Evaluation of Outlier Rankings and Outlier Scores | year=2012 | authors=Erich Schubert, Remigius Wojdanowski, [[Hans-Peter Kriegel]], Arthur Zimek | journal=Proc. 12 SIAM International Conference on Data Mining | url=http://siam.omnibooksonline.com/2012datamining/data/papers/107.pdf }}</ref> proposes methods for measuring similarity and diversity of methods for building advanced outlier detection [[Ensemble learning|ensembles]] using LOF variants and other algorithms and improving on the Feature Bagging approach discussed above. | |||
* ''Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection''<ref name="generalized">{{cite doi | 10.1007/s10618-012-0300-z}}</ref> discusses the general pattern in various local outlier detection methods (including e.g. LOF, a simplified version of LOF and LoOP) and abstracts from this into a general framework. This framework is then applied e.g. to detecting outliers in geographic data, video streams and authorship networks. | |||
== References == | |||
<references /> | |||
[[Category:Statistical outliers]] | |||
[[Category:Data mining]] | |||
[[Category:Machine learning algorithms]] | |||
Revision as of 06:57, 21 August 2013
In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours.[1]
LOF shares some concepts with DBSCAN and OPTICS such as the concepts of "core distance" and "reachability distance", which are used for local density estimation.[2]
Basic idea
As indicated by the title, the local outlier factor is based on a concept of a local density, where locality is given by nearest neighbors, whose distance is used to estimate the density. By comparing the local density of an object to the local densities of its neighbors, one can identify regions of similar density, and points that have a substantially lower density than their neighbors. These are considered to be outliers.
The local density is estimated by the typical distance at which a point can be "reached" from its neighbors. The definition of "reachability distance" used in LOF is an additional measure to produce more stable results within clusters.
Formal
Let be the distance of the object to the k nearest neighbor. Note that the set of the k nearest neighbors includes all objects at this distance, which can in the case of a "tie" be more than k objects. We denote the set of k nearest neighbors as .
This distance is used to define what is called reachability distance:
In words, the reachability distance of an object from is the true distance of the two objects, but at least the of . Objects that belong to the k nearest neighbors of (the "core" of , see DBSCAN cluster analysis) are considered to be equally distant. The reason for this distance is to get more stable results. Note that this is not a distance in the mathematical definition, since it is not symmetric. (While it is a common mistake[3] to always use the , this yields a slightly different method, referred to as Simplified-LOF[3])
The local reachability density of an object is defined by
Which is the quotient of the average reachability distance of the object from its neighbors. Note that it is not the average reachability of the neighbors from (which by definition would be the ), but the distance at which it can be "reached" from its neighbors. With duplicate points, this value can become infinite.
The local reachability densities are then compared with those of the neighbors using
Which is the average local reachability density of the neighbors divided by the objects own local reachability density. A value of approximately indicates that the object is comparable to its neighbors (and thus not an outlier). A value below indicates a denser region (which would be an inlier), while values significantly larger than indicate outliers.
Advantages
Due to the local approach, LOF is able to identify outliers in a data set that would not be outliers in another area of the data set. For example, a point at a "small" distance to a very dense cluster is an outlier, while a point within a sparse cluster might exhibit similar distances to its neighbors.
While the geometric intuition of LOF is only applicable to low dimensional vector spaces, the algorithm can be applied in any context a dissimilarity function can be defined. It has experimentally been shown to work very well in numerous setups, often outperforming the competitors, for example in network intrusion detection.[4]
The LOF family of methods can be easily generalized and then applied to various other problems, such as detecting outliers in geographic data, video streams or authorship networks.[3]
Disadvantages and Extensions
The resulting values are quotient-values and hard to interpret. A value of 1 or even less indicates a clear inlier, but there is no clear rule for when a point is an outlier. In one data set, a value of 1.1 may already be an outlier, in another dataset and parameterization (with strong local fluctuations) a value of 2 could still be an inlier. These differences can also occur within a dataset due to the locality of the method. There exist extensions of LOF that try to improve over LOF in these aspects:
- Feature Bagging for Outlier Detection [5] runs LOF on multiple projections and combines the results for improved detection qualities in high dimensions.
- Local Outlier Probability (LoOP)[6] is a method derived from LOF but using inexpensive local statistics to become less sensitive to the choice of the parameter k. In addition, the resulting values are scaled to a value range of .
- Interpreting and Unifying Outlier Scores [7] proposes a normalization of the LOF outlier scores to the interval using statistical scaling to increase usability and can be seen a improved version of the LoOP ideas.
- On Evaluation of Outlier Rankings and Outlier Scores [8] proposes methods for measuring similarity and diversity of methods for building advanced outlier detection ensembles using LOF variants and other algorithms and improving on the Feature Bagging approach discussed above.
- Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection[3] discusses the general pattern in various local outlier detection methods (including e.g. LOF, a simplified version of LOF and LoOP) and abstracts from this into a general framework. This framework is then applied e.g. to detecting outliers in geographic data, video streams and authorship networks.
References
- ↑ Template:Cite doi
- ↑ Template:Cite doi
- ↑ 3.0 3.1 3.2 3.3 Template:Cite doi
- ↑ One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting
In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang
Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules
Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.
A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running
The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more
There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang - ↑ Template:Cite doi
- ↑ Template:Cite doi
- ↑ One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting
In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang
Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules
Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.
A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running
The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more
There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang - ↑ One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting
In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang
Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules
Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.
A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running
The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more
There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang