Huber loss function

From formulasearchengine
Jump to navigation Jump to search

Template:One source

In statistical theory, the Huber loss function is a function used in robust estimation that allows construction of an estimate which allows the effect of outliers to be reduced, while treating non-outliers in a more standard way.


The Huber loss function describes the penalty incurred by an estimation procedure. Huber (1964[1]) defines the loss function piecewise by

This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where |a| = δ. In use, the variable often refers to the residuals, that is to the difference between the observed and predicted values, i.e. .


For estimating parameters, it is desirable for a loss function to have the following properties (for all values of of the parameter space):

  1. It is greater than or equal to the 0-1 loss function (which is defined as if and otherwise).
  2. It is continuous (or lower semicontinuous).

Two very commonly used loss functions are the squared loss, , and the absolute loss, . While the absolute loss is not differentiable at exactly one point, , where it is subdifferentiable with its convex subdifferential equal to the interval ; the absolute-value loss function results in a median-unbiased estimator, which can be evaluated for particular data sets by linear programming. The squared loss has the disadvantage that it has the tendency to be dominated by outliers---when summing over a set of 's (as in ), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions

As defined above, the Huber loss function is convex in a uniform neighborhood of its minimum , at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points and . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimor (using the absolute value function).

The log cosh loss function, which is defined as has a behavior like that of the Huber loss function.

Pseudo-Huber loss function

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }}

As such, this function approximates for small values of , and is parallel with slope for large values of .


The Huber loss function is used in robust statistics, M-estimation and additive modelling.[2]

See also

Template:More footnotes


  1. {{#invoke:citation/CS1|citation |CitationClass=citation }}
  2. Friedman, J. H. (2001), "Greedy Function Approximation: A Gradient Boosting Machine", The Annals of Statistics, Vol. 26, No.5 (Oct. 2001), 1189-1232