Visual cryptography: Difference between revisions

Revision as of 19:56, 22 August 2013

Template:Expert-subject In mathematical optimization, statistics, decision theory and machine learning, a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (sometimes called a reward function or a utility function), in which case it is to be maximized.

In statistics, typically a loss function is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an instance of data. In the context of economics, for example, this is usually economic cost or regret. In classification, it is the penalty for an incorrect classification of an example. In actuarial science, it is used in an insurance context to model benefits paid over premiums. In optimal control the loss is the penalty for failing to achieve a desired value.

Use in statistics

Parameter estimation for supervised learning tasks such as regression or classification can be formulated as the minimization of a loss function over a training set. The goal of estimation is to find a function that models its input well: if it were applied to the training set, it should predict the values (or class labels) associated with the samples in that set. The loss function quantifies the amount by which the prediction deviates from the actual values.

Definition

Formally, we begin by considering some family of distributions for a random variable X, that is indexed by some θ.

More intuitively, we can think of X as our "data", perhaps $X=(X_{1},\ldots ,X_{n})$ , where $X_{i}\sim F_{\theta }$ i.i.d. The X is the set of things the decision rule will be making decisions on. There exists some number of possible ways $F_{\theta }$ to model our data X, which our decision function can use to make decisions. For a finite number of models, we can thus think of θ as the index to this family of probability models. For an infinite family of models, it is a set of parameters to the family of distributions.

On a more practical note, it is important to understand that, while it is tempting to think of loss functions as necessarily parametric (since they seem to take θ as a "parameter"), the fact that θ is non-finite-dimensional is completely incompatible with this notion; for example, if the family of probability functions is uncountably infinite, θ indexes an uncountably infinite space.

From here, given a set A of possible actions, a decision rule is a function δ : $\scriptstyle {\mathcal {X}}$ → A.

A loss function is a real lower-bounded function L on Θ × A for some θ ∈ Θ. The value L(θ, δ(X)) is the cost of action δ(X) under parameter θ.^[1]

Expected loss

The value of the loss function itself is a random quantity because it depends on the outcome of a random variable X. Both frequentist and Bayesian statistical theory involve making a decision based on the expected value of the loss function: however this quantity is defined differently under the two paradigms.

Frequentist expected loss

We first define the expected loss in the frequentist context. It is obtained by taking the expected value with respect to the probability distribution, P_θ, of the observed data, X. This is also referred to as the risk function^[2] of the decision rule δ and the parameter θ. Here the decision rule depends on the outcome of X. The risk function is given by

R(\theta ,\delta )=\mathbb {E} _{\theta }L{\big (}\theta ,\delta (X){\big )}=\int _{X}L{\big (}\theta ,\delta (x){\big )}\,\operatorname {d} P_{\theta }(x).

Bayesian expected loss

In a Bayesian approach, the expectation is calculated using the posterior distribution π^* of the parameter θ:

\rho (\pi ^{*},a)=\int _{\Theta }L(\theta ,a)\,\operatorname {d} \pi ^{*}(\theta )

.

One then should choose the action a^* which minimises the expected loss. Although this will result in choosing the same action as would be chosen using the Bayes risk, the emphasis of the Bayesian approach is that one is only interested in choosing the optimal action under the actual observed data, whereas choosing the actual Bayes optimal decision rule, which is a function of all possible observations, is a much more difficult problem.

Economic choice under uncertainty

In economics, decision-making under uncertainty is often modelled using the von Neumann-Morgenstern utility function of the uncertain variable of interest, such as end-of-period wealth. Since the value of this variable is uncertain, so is the value of the utility function; it is the expected value of utility that is maximized.

Decision rules

A decision rule makes a choice using an optimality criterion. Some commonly used criteria are:

Minimax: Choose the decision rule with the lowest worst loss — that is, minimize the worst-case (maximum possible) loss:

{\underset {\delta }{\operatorname {arg\,min} }}\ \max _{\theta \in \Theta }\ R(\theta ,\delta ).

Invariance: Choose the optimal decision rule which satisfies an invariance requirement.
Choose the decision rule with the lowest average loss (i.e. minimize the expected value of the loss function):

{\underset {\delta }{\operatorname {arg\,min} }}\ \mathbb {E} _{\theta \in \Theta }[R(\theta ,\delta )]={\underset {\delta }{\operatorname {arg\,min} }}\ \int _{\theta \in \Theta }R(\theta ,\delta )\,p(\theta )\,d\theta .

Selecting a loss function

Sound statistical practice requires selecting an estimator consistent with the actual loss experienced in the context of a particular applied problem. Thus, in the applied use of loss functions, selecting which statistical method to use to model an applied problem depends on knowing the losses that will be experienced from being wrong under the problem's particular circumstances, which results in the introduction of an element of teleology into problems of scientific decision-making.

A common example involves estimating "location." Under typical statistical assumptions, the mean or average is the statistic for estimating location that minimizes the expected loss experienced under the Taguchi or squared-error loss function, while the median is the estimator that minimizes expected loss experienced under the absolute-difference loss function. Still different estimators would be optimal under other, less common circumstances.

In economics, when an agent is risk neutral, the objective function is simply expressed in monetary terms, such as profit, income, or end-of-period wealth.

But for risk-averse (or risk-loving) agents, loss is measured as the negative of a utility function, which represents satisfaction and is usually interpreted in ordinal terms rather than in cardinal (absolute) terms.

Other measures of cost are possible, for example mortality or morbidity in the field of public health or safety engineering.

For most optimization algorithms, it is desirable to have a loss function that is globally continuous and differentiable.

Two very commonly used loss functions are the squared loss, $L(a)=a^{2}$ , and the absolute loss, $L(a)=|a|$ . However the absolute loss has the disadvantage that it is not differentiable at $a=0$ . The squared loss has the disadvantage that it has the tendency to be dominated by outliers---when summing over a set of $a$ 's (as in $\sum _{i=1}^{n}L(a_{i})$ ), the final sum tends to be the result of a few particularly large a-values, rather than an expression of the average a-value.

Loss functions in Bayesian statistics

One of the consequences of Bayesian inference is that in addition to experimental data, the loss function does not in itself wholly determine a decision. What is important is the relationship between the loss function and the prior probability. So it is possible to have two different loss functions which lead to the same decision when the prior probability distributions associated with each compensate for the details of each loss function.Template:Cn

Combining the three elements of the prior probability, the data, and the loss function then allows decisions to be based on maximizing the subjective expected utility, a concept introduced by Leonard J. Savage.Template:Cn

Regret

Mining Engineer (Excluding Oil ) Truman from Alma, loves to spend time knotting, largest property developers in singapore developers in singapore and stamp collecting. Recently had a family visit to Urnes Stave Church. Savage also argued that using non-Bayesian methods such as minimax, the loss function should be based on the idea of regret, i.e., the loss associated with a decision should be the difference between the consequences of the best decision that could have been taken had the underlying circumstances been known and the decision that was in fact taken before they were known.

Quadratic loss function

The use of a quadratic loss function is common, for example when using least squares techniques or Taguchi methods. It is often more mathematically tractable than other loss functions because of the properties of variances, as well as being symmetric: an error above the target causes the same loss as the same magnitude of error below the target. If the target is t, then a quadratic loss function is

\lambda (x)=C(t-x)^{2}\;

for some constant C; the value of the constant makes no difference to a decision, and can be ignored by setting it equal to 1.

Many common statistics, including t-tests, regression models, design of experiments, and much else, use least squares methods applied using linear regression theory, which is based on the quadratric loss function.

The quadratic loss function is also used in linear-quadratic optimal control problems. In these problems, even in the absence of uncertainty, it may not be possible to achieve the desired values of all target variables. Often loss is expressed as a quadratic form in the deviations of the variables of interest from their desired values; this approach is tractable because it results in linear first-order conditions. In the context of stochastic control, the expected value of the quadratic form is used.

0-1 loss function

In statistics and decision theory, a frequently used loss function is the 0-1 loss function

L({\hat {y}},y)=I({\hat {y}}\neq y),\,

where $I$ is the indicator notation.

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

@@ Line 1: / Line 1: @@
-Roberto is what's written to his birth certificate but he never really [http://www.britannica.com/search?query=enjoyed enjoyed] reading that name. South Carolina is your birth place. The most liked hobby for him and his kids is so that you can fish and he's resulted in being doing it for a very long time. Auditing is how he supports my family. Go to his website to obtain out more: http://circuspartypanama.com<br><br>Here is my web blog [http://circuspartypanama.com gem Hack clash of Clans]
+{{Expert-subject|Statistics|date=March 2011}}
+In [[mathematical optimization]], [[statistics]], [[decision theory]] and [[machine learning]], a '''loss function''' or '''cost function''' is a function that maps an [[event (probability theory)|event]] or values of one or more variables onto a [[real number]] intuitively representing some "cost" associated with the event. An [[optimization problem]] seeks to minimize a loss function. An '''objective function''' is either a loss function or its negative (sometimes called a [[reward function]] or a [[utility function]]), in which case it is to be maximized.
+In statistics, typically a  loss function is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an instance of data.  In the context of [[economics]], for example, this is usually [[economic cost]] or [[Regret (decision theory)|regret]].  In [[Statistical classification|classification]], it is the penalty for an incorrect classification of an example. In [[actuarial science]], it is used in an insurance context to model benefits paid over premiums. In [[optimal control]] the loss is the penalty for failing to achieve a desired value.
+== Use in statistics ==
+Parameter estimation for [[supervised learning]] tasks such as [[regression analysis|regression]] or classification can be formulated as the [[Numerical optimization|minimization]] of a loss function over a [[training set]]. The goal of estimation is to find a function that models its input well: if it were applied to the training set, it should predict the values (or class labels) associated with the samples in that set. The loss function quantifies the amount by which the prediction deviates from the actual values.
+=== Definition ===
+Formally, we begin by considering some family of distributions for a [[random variable]] ''X'', that is indexed by some ''θ''.
+More intuitively, we can think of ''X'' as our "data", perhaps <math>X=(X_1,\ldots,X_n)</math>, where <math>X_i\sim F_\theta</math> i.i.d.  The ''X'' is the set of things the [[decision rule]] will be making decisions on. There exists some number of possible ways <math>F_\theta</math> to model our data ''X'', which our decision function can use to make decisions. For a finite number of models, we can thus think of ''θ'' as the ''index'' to this family of probability models.  For an infinite family of models, it is a set of parameters to the family of distributions.
+On a more practical note, it is important to understand that, while it is tempting to think of loss functions as necessarily parametric (since they seem to take ''θ'' as a "parameter"), the fact that ''θ'' is non-finite-dimensional is completely incompatible with this notion; for example, if the family of probability functions is uncountably infinite, ''θ'' indexes an uncountably infinite space.
+From here, given a set ''A'' of possible actions, a '''[[decision rule]]''' is a function ''δ''&nbsp;:&nbsp;<math>\scriptstyle\mathcal{X}</math>→&nbsp;''A''.
+A '''loss function''' is a real lower-bounded function ''L'' on ''Θ''&nbsp;&times;&nbsp;''A'' for some ''θ ∈ Θ''. The value ''L''(''&theta;'',&nbsp;''&delta;''(''X'')) is the ''cost'' of action ''δ''(''X'') under parameter ''&theta;''.<ref>{{SpringerEOM |title=Loss function |id=L/l060900 |first=M.S. |last=Nikulin}}</ref>
+== Expected loss ==
+The value of the loss function itself is a random quantity because it depends on the outcome of a random variable ''X''. Both [[frequentist]] and [[Bayesian probability|Bayesian]] statistical theory involve making a decision based on the [[expected value]] of the loss function: however this quantity is defined differently under the two paradigms.
+=== Frequentist expected loss ===
+We first define the expected loss in the frequentist context. It is obtained by taking the expected value with respect to the probability distribution, ''P''<sub>''θ''</sub>, of the observed data, ''X''. This is also referred to as the '''risk function'''<ref>{{SpringerEOM| title=Risk of a statistical procedure |id=R/r082490 |first=M.S. |last=Nikulin}}</ref> of the decision rule ''δ'' and the parameter ''θ''. Here the decision rule depends on the outcome of ''X''. The risk function is given by
+:<math>R(\theta, \delta) = \mathbb{E}_\theta L\big( \theta, \delta(X) \big) = \int_X L\big( \theta, \delta(x) \big) \, \operatorname{d} P_\theta (x) .</math>
+=== Bayesian expected loss ===
+In a Bayesian approach, the expectation is calculated using the [[posterior distribution]] ''&pi;<sup>*</sup>'' of the parameter ''&theta;'':
+:<math>\rho(\pi^*,a) = \int_\Theta L(\theta, a) \, \operatorname{d} \pi^* (\theta)</math>.
+One then should choose the action ''a<sup>*</sup>'' which minimises the expected loss. Although this will result in choosing the same action as would be chosen using the Bayes risk, the emphasis of the Bayesian approach is that one is only interested in choosing the optimal action under the actual observed data, whereas choosing the actual Bayes optimal decision rule, which is a function of all possible observations, is a much more difficult problem.
+=== Economic choice under uncertainty===
+In economics, decision-making under uncertainty is often modelled using the [[von Neumann-Morgenstern utility function]] of the uncertain variable of interest, such as end-of-period wealth. Since the value of this variable is uncertain, so is the value of the utility function; it is the expected value of utility that is maximized.
+==Decision rules==
+A decision rule makes a choice using an optimality criterion. Some commonly used criteria are:
+*'''[[Minimax]]''': Choose the decision rule with the lowest worst loss — that is, minimize the worst-case (maximum possible) loss:
+::<math> \underset{\delta} {\operatorname{arg\,min}} \ \max_{\theta \in \Theta} \ R(\theta,\delta). </math>
+*'''[[Invariant estimator|Invariance]]''': Choose the optimal decision rule which satisfies an invariance requirement.
+*Choose the decision rule with the lowest average loss (i.e. minimize the [[expected value]] of the loss function):
+::<math> \underset{\delta} {\operatorname{arg\,min}} \ \mathbb{E}_{\theta \in \Theta} [R(\theta,\delta)] = \underset{\delta} {\operatorname{arg\,min}} \ \int_{\theta \in \Theta} R(\theta,\delta) \, p(\theta) \,d\theta. </math>
+== Selecting a loss function ==
+Sound statistical practice requires selecting an estimator consistent with the actual loss experienced in the context of a particular applied problem. Thus, in the applied use of loss functions, selecting which statistical method to use to model an applied problem depends on knowing the losses that will be experienced from being wrong under the problem's particular circumstances, which results in the introduction of an element of [[teleology]] into problems of scientific decision-making.
+A common example involves estimating "[[location parameter|location]]." Under typical statistical assumptions, the [[mean]] or average is the statistic for estimating location that minimizes the expected loss experienced under the [[Taguchi methods|Taguchi]] or [[least squares|squared-error]] loss function, while the [[median]] is the estimator that minimizes expected loss experienced under the absolute-difference loss function. Still different estimators would be optimal under other, less common circumstances.
+In economics, when an agent is [[risk neutral]], the objective function is simply expressed in monetary terms, such as profit, income, or end-of-period wealth.
+But for [[Risk aversion|risk-averse]] (or [[risk-loving]]) agents, loss is measured as the negative of a [[utility|utility function]], which represents satisfaction and is usually interpreted in [[ordinal utility|ordinal]] terms rather than in [[cardinal utility|cardinal]] (absolute) terms.
+Other measures of cost are possible, for example [[death|mortality]] or [[morbidity]] in the field of [[public health]] or [[safety engineering]].
+For most optimization algorithms, it is desirable to have a loss function that is globally continuous and differentiable.
+Two very commonly used loss functions are the [[mean squared error|squared loss]], <math>L(a) = a^2</math>, and the [[absolute deviation|absolute loss]], <math>L(a)=|a|</math>.  However the absolute loss has the disadvantage that it is not differentiable at <math>a=0</math>.  The squared loss has the disadvantage that it has the tendency to be dominated by outliers---when summing over a set of <math>a</math>'s (as in <math>\sum_{i=1}^n L(a_i) </math> ), the final sum tends to be the result of a few particularly large a-values, rather than an expression of the average a-value.
+==Loss functions in Bayesian statistics==
+One of the consequences of [[Bayesian inference]] is that in addition to experimental data, the loss function  does not in itself wholly determine a decision.  What is important is the relationship between the loss function and the [[prior probability]].  So it is possible to have two different loss functions which lead to the same decision when the [[prior probability distribution]]s associated with each compensate for the details of each loss function.{{cn|date=February 2012}}
+Combining the three elements of the prior probability, the data, and the loss function then allows decisions to be based on maximizing the [[subjective expected utility]], a concept introduced by [[Leonard J. Savage]].{{cn|date=February 2012}}
+==Regret==
+{{main|Regret (decision theory)}}
+Savage also argued that using non-Bayesian methods such as [[minimax]], the loss function should be based on the idea of ''[[regret (decision theory)|regret]]'', i.e., the loss associated with a decision should be the difference between the consequences of the best decision that could have been taken had the underlying circumstances been known and the decision that was in fact taken before they were known.
+==Quadratic loss function==
+The use of a quadratic loss function is common, for example when using [[least squares]] techniques or [[Taguchi methods]]. It is often more mathematically tractable than other loss functions because of the properties of [[variance]]s, as well as being symmetric: an error above the target causes the same loss as the same magnitude of error below the target.  If the target is ''t'', then a quadratic loss function is
+:<math>\lambda(x) = C (t-x)^2 \; </math>
+for some constant ''C''; the value of the constant makes no difference to a decision, and can be ignored by setting it equal to 1.
+Many common statistics, including [[t-test]]s, [[Regression analysis|regression]] models, [[design of experiments]], and much else, use [[least squares]] methods applied using [[linear regression]] theory, which is based on the quadratric loss function.
+The quadratic loss function is also used in [[Linear-quadratic regulator|linear-quadratic optimal control problems]]. In these problems, even in the absence of uncertainty, it may not be possible to achieve the desired values of all target variables. Often loss is expressed as a [[quadratic form]] in the deviations of the variables of interest from their desired values; this approach is [[closed-form expression|tractable]] because it results in linear [[first-order condition]]s. In the context of [[stochastic control]], the expected value of the quadratic form is used.
+==0-1 loss function==
+In [[statistics]] and [[decision theory]], a frequently used loss function is the ''0-1 loss function''
+: <math>L(\hat{y}, y) = I(\hat{y} \ne y), \, </math>
+where <math>I</math> is the [[indicator notation]].
+==See also==
+*[[Discounted maximum loss]]
+*[[Hinge loss]]
+*[[Scoring rule]]
+==References==
+{{reflist}}
+==Further reading==
+* {{cite book
+ |title=Statistical decision theory and Bayesian Analysis
+ |first=James O. |last=Berger |authorlink=James Berger (statistician)
+ |year=1985
+ |edition=2nd
+ |publisher=Springer-Verlag |location=New York
+ |ISBN=0-387-96098-8 |mr=0804611
+}}
+{{DEFAULTSORT:Loss Function}}
+[[Category:Statistical theory]]
+[[Category:Decision theory]]
+[[Category:Econometrics]]
+[[Category:Information, knowledge, and uncertainty]]
+[[Category:Optimal decisions]]
+[[Category:Loss functions|*]]

Visual cryptography: Difference between revisions

Revision as of 19:56, 22 August 2013

Contents

Use in statistics

Definition

Expected loss

Frequentist expected loss

Bayesian expected loss

Economic choice under uncertainty

Decision rules

Selecting a loss function

Loss functions in Bayesian statistics

Regret

Quadratic loss function

0-1 loss function

See also

References

Further reading

Navigation menu

Visual cryptography: Difference between revisions

Revision as of 19:56, 22 August 2013

Use in statistics

Definition

Expected loss

Frequentist expected loss

Bayesian expected loss

Economic choice under uncertainty

Decision rules

Selecting a loss function

Loss functions in Bayesian statistics

Regret

Quadratic loss function

0-1 loss function

See also

References

Further reading

Navigation menu

Search