# Huber loss function

In statistical theory, the **Huber loss function** is a function used in robust estimation that allows construction of an estimate which allows the effect of outliers to be reduced, while treating non-outliers in a more standard way.

## Definition

The **Huber loss function** describes the penalty incurred by an estimation procedure. Huber (1964^{[1]}) defines the loss function piecewise by

This function is quadratic for small values of *a*, and linear for large values, with equal values and slopes of the different sections at the two points where |*a*| = *δ*. In use, the variable often refers to the residuals, that is to the difference between the observed and predicted values, i.e. .

## Motivation

For estimating parameters, it is desirable for a loss function to have the following properties (for all values of of the parameter space):

- It is greater than or equal to the 0-1 loss function (which is defined as if and otherwise).
- It is continuous (or lower semicontinuous).

Two very commonly used loss functions are the squared loss, , and the absolute loss, . While the absolute loss is not differentiable at exactly one point, , where it is subdifferentiable with its convex subdifferential equal to the interval ; the absolute-value loss function results in a median-unbiased estimator, which can be evaluated for particular data sets by linear programming. The squared loss has the disadvantage that it has the tendency to be dominated by outliers---when summing over a set of 's (as in ), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions

As defined above, the Huber loss function is convex in a uniform neighborhood of its minimum , at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points and . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimor (using the absolute value function).

The log cosh loss function, which is defined as has a behavior like that of the Huber loss function.

## Pseudo-Huber loss function

The **Pseudo-Huber loss function** can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B=
{{#invoke:Category handler|main}}{{#invoke:Category handler|main}}^{[citation needed]}
}}

As such, this function approximates for small values of , and is parallel with slope for large values of .

## Applications

The Huber loss function is used in robust statistics, M-estimation and additive modelling.^{[2]}