Total sum of squares: Difference between revisions
No edit summary |
en>Papxr No edit summary |
||
Line 1: | Line 1: | ||
[ | {{Regression bar}} | ||
The '''goodness of fit''' of a [[statistical model]] describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in [[statistical hypothesis testing]], e.g. to [[normality test|test for normality]] of [[Errors and residuals in statistics|residual]]s, to test whether two samples are drawn from identical distributions (see [[Kolmogorov–Smirnov]] test), or whether outcome frequencies follow a specified distribution (see [[Pearson's chi-squared test]]). In the [[analysis of variance]], one of the components into which the variance is partitioned may be a [[lack-of-fit sum of squares]]. | |||
==Fit of distributions== | |||
In assessing whether a given distribution is suited to a data-set, the following [[statistical hypothesis test|test]]s and their underlying measures of fit can be used: | |||
:*[[Kolmogorov–Smirnov test]]; | |||
:*[[Cramér–von Mises criterion]]; | |||
:*[[Anderson–Darling test]]; | |||
:*[[Shapiro–Wilk test]]; | |||
:*[[Chi Square test]]; | |||
:*[[Akaike information criterion]]; | |||
:*[[Hosmer–Lemeshow test]]; | |||
==Regression analysis== | |||
In [[regression analysis]], the following topics relate to goodness of fit: | |||
:* [[Coefficient of determination]] (The R squared measure of goodness of fit); | |||
:* [[Lack-of-fit sum of squares]]. | |||
===Example=== | |||
One way in which a measure of goodness of fit statistic can be constructed, in the case where the variance of the measurement error is known, is to construct a weighted sum of squared errors: | |||
:<math> \chi^2 = \sum {\frac{(O - E)^2}{\sigma^2}}</math> | |||
where <math>\sigma^2</math> is the known [[variance]] of the observation, O is the observed data and E is the theoretical data.<ref name=laub>[http://neutrons.ornl.gov/workshops/sns_hfir_users/posters/Laub_Chi-Square_Data_Fitting.pdf Charlie Laub and Tonya L. Kuhl: ''Chi-Squared Data Fitting''. University California, Davis. ]</ref> This definition is only useful when one has estimates for the error on the measurements, but it leads to a situation where a [[chi-squared distribution]] can be used to test goodness of fit, provided that the errors can be assumed to have a [[normal distribution]]. | |||
The reduced chi-squared statistic is simply the chi-squared divided by the number of [[Degrees of freedom (statistics)|degrees of freedom]]:<ref name=laub /><ref>John Robert Taylor: ''An introduction to error analysis'', page 268. University Science Books, 1997.</ref><ref>[http://www.physics.csbsju.edu/stats/chi_fit.html Kirkman, T.W.: ''Chi-Squared Curve Fitting''.]</ref><ref>[http://w3eos.whoi.edu/12.747/notes/lect03/lectno03.html David M. Glover, William J. Jenkins, and Scott C. Doney: ''Least Squares and regression techniques, goodness of fit and tests, non-linear least squares techniques''. Woods Hole Oceanographic Institute, 2008. ]</ref> | |||
:<math> \chi_\mathrm{red}^2 = \frac{\chi^2}{\nu} = \frac{1}{\nu} \sum {\frac{(O - E)^2}{\sigma^2}}</math> | |||
where <math>\nu</math> is the number of [[degrees of freedom (statistics)|degrees of freedom]], usually given by <math>N-n-1</math>, where <math>N</math> is the number of observations, and <math>n</math> is the number of fitted parameters, assuming that the mean value is an additional fitted parameter. The advantage of the reduced chi-squared is that it already normalizes for the number of data points and model complexity. This is also known as the [[mean square weighted deviation]]. | |||
As a rule of thumb, a large <math>\chi_\mathrm{red}^2</math> indicates a poor model fit. However <math>\chi_\mathrm{red}^2 < 1</math> indicates that the model is 'over-fitting' the data (either the model is improperly fitting noise, or the error variance has been overestimated). A <math>\chi_\mathrm{red}^2 > 1</math> indicates that the fit has not fully captured the data (or that the error variance has been underestimated). In principle a value of <math>\chi_\mathrm{red}^2 = 1</math> indicates that the extent of the match between observations and estimates is in accord with the error variance. | |||
==Categorical data== | |||
The following are examples that arise in the context of [[categorical data]]. | |||
===Pearson's chi-squared test=== | |||
[[Pearson's chi-squared test]] uses a measure of goodness of fit which is the sum of differences between observed and [[Expected value|expected outcome]] frequencies (that is, counts of observations), each squared and divided by the expectation: | |||
:<math> \chi^2 = \sum_{i=1}^n {\frac{(O_i - E_i)}{E_i}^2}</math> | |||
where: | |||
:''O<sub>i</sub>'' = an observed frequency (i.e. count) for bin ''i'' | |||
:''E<sub>i</sub>'' = an expected (theoretical) frequency for bin ''i'', asserted by the [[null hypothesis]]. | |||
The expected frequency is calculated by: | |||
:<math>E_i \, = \, \bigg( F(Y_u) \, - \, F(Y_l) \bigg) \, N</math> | |||
where: | |||
:''F'' = the cumulative Distribution function for the distribution being tested. | |||
:''Y<sub>u</sub>'' = the upper limit for class ''i'', | |||
:''Y<sub>l</sub>'' = the lower limit for class ''i'', and | |||
:''N'' = the sample size | |||
The resulting value can be compared to the [[chi-squared distribution]] to determine the goodness of fit. In order to determine the [[Degrees of freedom (statistics)|degrees of freedom]] of the chi-squared distribution, one takes the total number of observed frequencies and subtracts the number of estimated parameters. The test statistic follows, approximately, a chi-square distribution with (''k'' − ''c'') degrees of freedom where ''k'' is the number of non-empty cells and ''c'' is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution. | |||
====Example: equal frequencies of men and women==== | |||
For example, to test the hypothesis that a random sample of 100 people has been drawn from a population in which men and women are equal in frequency, the observed number of men and women would be compared to the theoretical frequencies of 50 men and 50 women. If there were 44 men in the sample and 56 women, then | |||
:<math> \chi^2 = {(44 - 50)^2 \over 50} + {(56 - 50)^2 \over 50} = 1.44</math> | |||
If the null hypothesis is true (i.e., men and women are chosen with equal probability in the sample), the test statistic will be drawn from a chi-squared distribution with one [[degrees of freedom (statistics)|degree of freedom]]. Though one might expect two degrees of freedom (one each for the men and women), we must take into account that the total number of men and women is constrained (100), and thus there is only one degree of freedom (2 − 1). Alternatively, if the male count is known the female count is determined, and vice-versa. | |||
Consultation of the [[chi-squared distribution]] for 1 degree of freedom shows that the [[probability]] of observing this difference (or a more extreme difference than this) if men and women are equally numerous in the population is approximately 0.23. This probability is higher than conventional criteria for [[statistical significance]] (.001-.05), so normally we would not reject the null hypothesis that the number of men in the population is the same as the number of women (i.e. we would consider our sample within the range of what we'd expect for a 50/50 male/female ratio.) | |||
=== Binomial case === | |||
A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are ''n'' trials each with probability of success, denoted by ''p''. Provided that ''np''<sub>''i''</sub> ≫ 1 for every ''i'' (where ''i'' = 1, 2, ..., ''k''), then | |||
<math> \chi^2 = \sum_{i=1}^{k} {\frac{(N_i - np_i)^2}{np_i}} = \sum_{\mathrm{all\ cells}}^{} {\frac{(\mathrm{O} - \mathrm{E})^2}{\mathrm{E}}}.</math> | |||
This has approximately a chi-squared distribution with ''k'' − 1 df. The fact that df = ''k'' − 1 is a consequence of the restriction <math> \sum N_i=n</math>. We know there are ''k'' observed cell counts, however, once any ''k'' − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only ''k'' − 1 freely determined cell counts, thus df = ''k'' − 1. | |||
==Other measures of fit== | |||
The [[likelihood ratio test]] statistic is a measure of the goodness of fit of a model, judged by whether an expanded form of the model provides a substantially improved fit. | |||
== See also == | |||
* [[Deviance (statistics)]] (related to [[Generalized linear model|GLM]]) | |||
* [[Overfitting]] | |||
==References== | |||
<references/> | |||
[[Category:Statistical deviation and dispersion]] | |||
[[Category:Statistical tests]] | |||
[[Category:Categorical data]] | |||
[[ru:Статистический критерий]] | |||
[[sv:Chitvåfördelning]] | |||
[[zh:拟合优度]] |
Latest revision as of 23:02, 25 May 2013
Template:Regression bar The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-squared test). In the analysis of variance, one of the components into which the variance is partitioned may be a lack-of-fit sum of squares.
Fit of distributions
In assessing whether a given distribution is suited to a data-set, the following tests and their underlying measures of fit can be used:
Regression analysis
In regression analysis, the following topics relate to goodness of fit:
- Coefficient of determination (The R squared measure of goodness of fit);
- Lack-of-fit sum of squares.
Example
One way in which a measure of goodness of fit statistic can be constructed, in the case where the variance of the measurement error is known, is to construct a weighted sum of squared errors:
where is the known variance of the observation, O is the observed data and E is the theoretical data.[1] This definition is only useful when one has estimates for the error on the measurements, but it leads to a situation where a chi-squared distribution can be used to test goodness of fit, provided that the errors can be assumed to have a normal distribution.
The reduced chi-squared statistic is simply the chi-squared divided by the number of degrees of freedom:[1][2][3][4]
where is the number of degrees of freedom, usually given by , where is the number of observations, and is the number of fitted parameters, assuming that the mean value is an additional fitted parameter. The advantage of the reduced chi-squared is that it already normalizes for the number of data points and model complexity. This is also known as the mean square weighted deviation.
As a rule of thumb, a large indicates a poor model fit. However indicates that the model is 'over-fitting' the data (either the model is improperly fitting noise, or the error variance has been overestimated). A indicates that the fit has not fully captured the data (or that the error variance has been underestimated). In principle a value of indicates that the extent of the match between observations and estimates is in accord with the error variance.
Categorical data
The following are examples that arise in the context of categorical data.
Pearson's chi-squared test
Pearson's chi-squared test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:
where:
- Oi = an observed frequency (i.e. count) for bin i
- Ei = an expected (theoretical) frequency for bin i, asserted by the null hypothesis.
The expected frequency is calculated by:
where:
- F = the cumulative Distribution function for the distribution being tested.
- Yu = the upper limit for class i,
- Yl = the lower limit for class i, and
- N = the sample size
The resulting value can be compared to the chi-squared distribution to determine the goodness of fit. In order to determine the degrees of freedom of the chi-squared distribution, one takes the total number of observed frequencies and subtracts the number of estimated parameters. The test statistic follows, approximately, a chi-square distribution with (k − c) degrees of freedom where k is the number of non-empty cells and c is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution.
Example: equal frequencies of men and women
For example, to test the hypothesis that a random sample of 100 people has been drawn from a population in which men and women are equal in frequency, the observed number of men and women would be compared to the theoretical frequencies of 50 men and 50 women. If there were 44 men in the sample and 56 women, then
If the null hypothesis is true (i.e., men and women are chosen with equal probability in the sample), the test statistic will be drawn from a chi-squared distribution with one degree of freedom. Though one might expect two degrees of freedom (one each for the men and women), we must take into account that the total number of men and women is constrained (100), and thus there is only one degree of freedom (2 − 1). Alternatively, if the male count is known the female count is determined, and vice-versa.
Consultation of the chi-squared distribution for 1 degree of freedom shows that the probability of observing this difference (or a more extreme difference than this) if men and women are equally numerous in the population is approximately 0.23. This probability is higher than conventional criteria for statistical significance (.001-.05), so normally we would not reject the null hypothesis that the number of men in the population is the same as the number of women (i.e. we would consider our sample within the range of what we'd expect for a 50/50 male/female ratio.)
Binomial case
A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are n trials each with probability of success, denoted by p. Provided that npi ≫ 1 for every i (where i = 1, 2, ..., k), then
This has approximately a chi-squared distribution with k − 1 df. The fact that df = k − 1 is a consequence of the restriction . We know there are k observed cell counts, however, once any k − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only k − 1 freely determined cell counts, thus df = k − 1.
Other measures of fit
The likelihood ratio test statistic is a measure of the goodness of fit of a model, judged by whether an expanded form of the model provides a substantially improved fit.
See also
- Deviance (statistics) (related to GLM)
- Overfitting
References
- ↑ 1.0 1.1 Charlie Laub and Tonya L. Kuhl: Chi-Squared Data Fitting. University California, Davis.
- ↑ John Robert Taylor: An introduction to error analysis, page 268. University Science Books, 1997.
- ↑ Kirkman, T.W.: Chi-Squared Curve Fitting.
- ↑ David M. Glover, William J. Jenkins, and Scott C. Doney: Least Squares and regression techniques, goodness of fit and tests, non-linear least squares techniques. Woods Hole Oceanographic Institute, 2008.