|
|
Line 1: |
Line 1: |
| In [[statistics]], '''familywise error rate (FWER)''' is the [[probability]] of making one or more false discoveries, or [[Type I and type II errors|type I error]]s among all the hypotheses when performing [[multiple comparisons|multiple hypotheses tests]].
| | Alyson Meagher is the name her parents gave her but she doesn't like when people use her complete title. To climb is something she would by no means give up. Office supervising is what she does for a residing. My wife and I live in Kentucky.<br><br>Also visit my web-site; psychic phone readings ([http://www.skullrocker.com/blogs/post/10991 skullrocker.com]) |
| | |
| ==Definitions==
| |
| | |
| ===Classification of ''m'' hypothesis tests===
| |
| Suppose we have ''m'' null hypotheses, denoted by: ''H''<sub>1</sub>, ''H''<sub>2</sub>, ..., ''H''<sub>''m''</sub>.
| |
| <br />
| |
| Using a [[Statistical hypothesis testing|statistical test]], each hypothesis is declared significant/non-significant.<br />
| |
| Summing the test results over ''H<sub>i</sub>'' will give us the following table and related random variables:
| |
| | |
| {|class="wikitable"
| |
| ! |
| |
| ! Null hypothesis is True
| |
| ! Alternative hypothesis is True
| |
| ! | Total
| |
| |- align="center"
| |
| ! | Declared significant
| |
| | <math>V</math>
| |
| | <math>S</math>
| |
| | <math>R</math>
| |
| |- align="center"
| |
| ! | Declared non-significant
| |
| | <math>U</math>
| |
| | <math>T</math>
| |
| | <math>m - R</math>
| |
| |- align="center"
| |
| ! Total
| |
| | <math>m_0</math>
| |
| | <math>m - m_0</math>
| |
| | <math>m</math>
| |
| |}
| |
| | |
| * <math>m_0</math> is the number of true [[null hypothesis|null hypotheses]], an unknown parameter
| |
| * <math>m - m_0</math> is the number of true [[alternative hypothesis|alternative hypotheses]]
| |
| * <math>V</math> is the number of [[Type I and type II errors|false positives (Type I error)]]
| |
| * <math>S</math> is the number of [[Type I and type II errors|true positives]]
| |
| * <math>T</math> is the number of [[Type I and type II errors|false negatives (Type II error)]]
| |
| * <math>U</math> is the number of [[Type I and type II errors|true negatives]]
| |
| * <math>R</math> is the number of rejected null hypotheses
| |
| | |
| * <math>R</math> is an observable [[random variable]], while <math>S</math>, <math>T</math>, <math>U</math>, and <math>V</math> are unobservable [[random variable]]s.
| |
| | |
| ===The FWER===
| |
| The FWER is the probability of making even one [[Type I and type II errors|type I error]] In the family,
| |
| | |
| :<math> \mathrm{FWER} = \Pr(V \ge 1), \,</math>
| |
| | |
| or equivalently,
| |
| | |
| :<math> \mathrm{FWER} = 1 -\Pr(V = 0).</math>
| |
| | |
| Thus, by assuring <math> \mathrm{FWER} \le \alpha\,\! \,</math>, the probability of making even one [[Type I and type II errors|type I error]] in the family is controlled at level <math>\alpha\,\!</math>.
| |
| | |
| A procedure controls the FWER '''in the weak sense''' if the FWER control at level <math>\alpha\,\!</math> is guaranteed '''only''' when all null hypotheses are true (i.e. when <math>m_0</math> = <math>m</math> so the global null hypothesis is true)
| |
| | |
| A procedure controls the FWER '''in the strong sense''' if the FWER control at level <math>\alpha\,\!</math> is guaranteed for '''any''' configuration of true and non-true null hypotheses (including the global null hypothesis)
| |
| | |
| ==The concept of a family==
| |
| | |
| Within the statistical framework, there are several definitions for the term "family":
| |
| | |
| * First of all, a distinction must be made between [[exploratory data analysis]] and [[Statistical hypothesis testing|confirmatory data analysis]]: for exploratory analysis – the family constitutes all inferences made and those that potentially could be made, whereas in the case of confirmatory analysis, the family must include only inferences of interest specified prior to the study.
| |
|
| |
| * '''Hochberg & Tamhane (1987)'''<ref>{{Cite book |author=Hochberg Y, Tamhane AC |year=1987 |title=Multiple comparison procedures |location=New York |publisher=Wiley}}</ref> define "family" as "any collection of inferences for which it is meaningful to take into account some combined measure of error".
| |
| | |
| * According to '''Cox (1982)''', a set of inferences should be regarded a family:
| |
| # To take into account the selection effect due to [[data dredging]]
| |
| # To ensure simultaneous correctness of a set of inferences as to guarantee a correct overall decision
| |
| | |
| To summarize, a family could best be defined by the potential selective inference that is being faced: A family is the smallest set of items of inference in an analysis, interchangeable about their meaning for the goal of research, from which selection of results for action, presentation or highlighting could be made ([[Yoav Benjamini|Benjamini]]).
| |
| | |
| === History ===
| |
| | |
| [[John Tukey|Tukey]] first coined the term [[experimentwise error rate]] and '''"per-experiment"''' error rate for the error rate that the researcher should use as a control level in a multiple hypothesis experiment.
| |
| | |
| Since not all tests done in an experiment should constitute a single family (for example: in a multiple-stage experiment, a separate family might be used for each stage), the terminology was changed (by '''Miller''') to "family-wise error-rate" (and was later adopted by Tukey as '''"batchwise"''' or '''"per batch"''').
| |
| | |
| ==Simultaneous inference vs. selective inference==
| |
| | |
| Controlling FWER is a form of '''simultaneous inference''', where all inference made in a family are jointly corrected up to a pre-specified error rate. Depending on the definition of the family, the researcher might choose a different form of inference:
| |
| | |
| For example, simultaneous inference may be too conservative for certain large-scale problems that are currently being addressed by science. For such problems, a '''selective inference''' approach might be more suitable, since it assumes that any sub-group of hypotheses from the large scale group can be viewed as a family. Selective inference is usually performed by controlling the [[false discovery rate|FDR (false discovery rate criteria)]]. FDR controlling procedures are more [[Statistical power|powerful]] (i.e. less conservative) procedures than the familywise error rate (FWER) procedures (such as the [[Bonferroni correction]]), at the cost of increasing the likelihood of false positives within the rejected hypothesis.
| |
| | |
| ==Controlling procedures==
| |
| | |
| The following is a concise review of some of the "old and trusted" solutions that ensure strong level <math>\alpha</math> FWER control, followed by some newer solutions. A good review of many of the available methods can be found in the book "Multiple comparison procedures" (Wiley, 1987), by Hochberg and [[Ajit Tamhane|Tamhane]].
| |
| | |
| === The Bonferroni procedure ===
| |
| {{main|Bonferroni correction}}
| |
| * Denote by <math>p_{i}</math> the p-value for testing <math>H_{i}</math>
| |
| * reject <math>H_{i}</math> if <math> p_{i} \leq \frac{\alpha}{m} </math>
| |
| | |
| === The Šidák procedure ===
| |
| {{main|Šidák correction}}
| |
| * If the test statistics are independent then testing each hypothesis at level <math> \alpha_{SID} = 1-(1-\alpha)^\frac{1}{m} </math> is Sidak's multiple testing procedure.
| |
| * This test is more powerful than Bonferroni but the gain is small, and the procedure is far less general than Bonferroni's since it requires independence.
| |
| | |
| === Tukey's procedure ===
| |
| {{main|Tukey's range test}}
| |
| * Tukey's procedure is only applicable for [[pairwise comparison]]s.
| |
| * It assumes independence of the observations being tested, as well as equal variation across observations ([[homoscedasticity]]).
| |
| * The procedure calculates for each pair the [[studentized range]] statistic: <math> \frac {Y_{A}-Y_{B}} {SE} </math> where <math>Y_{A}</math> is the larger of the two means being compared, <math>Y_{B}</math> is the smaller, and <math>SE</math> is the standard error of the data in question.
| |
| * Tukey's test is essentially a [[Student's t-test]], except that it corrects for '''family-wise error-rate'''.
| |
| | |
| A correction with a similar framework is [[Fisher’s LSD]] (Least Significant Difference).
| |
| | |
| '''some newer solutions for strong level <math>\alpha</math> FWER control:'''
| |
| | |
| === Holm's step-down procedure (1979)===
| |
| {{main|Holm–Bonferroni method}}
| |
| | |
| * Start by ordering the p-values <math>P_{(1)} \ldots P_{(m)}</math> and let the associated hypotheses be <math>H_{(1)} \ldots H_{(m)}</math>
| |
| | |
| * Let <math>R</math> be the smallest <math>k</math> such that <math>P_{(k)} > \frac{\alpha}{m+1-k}</math>
| |
| | |
| * Reject the null hypotheses <math>H_{(1)} \ldots H_{(R-1)}</math>. If <math>R = 1</math> then none of the hypotheses are rejected.
| |
| | |
| * This procedure is uniformly better than Bonferroni's.
| |
| | |
| * It is worth noticing here that the reason why this procedure controls the family-wise error rate for all the m hypotheses at level α in the strong sense, is because it is essentially a [[closed testing procedure]]. As such, each intersection is tested using the simple Bonferroni test.
| |
| | |
| === Hochberg's step-up procedure (1988)===
| |
| Hochberg's step-up procedure (1988) is performed using the following steps:<ref name=Hochberg1988>{{cite journal | last1 = Hochberg | first1= Yosef | year = 1988 | title = A Sharper Bonferroni Procedure for Multiple Tests of Significance | journal = [[Biometrika]] | volume = 75 | issue = 4 | pages = 800–802 | url = http://www-stat.wharton.upenn.edu/~steele/Courses/956/Resource/MultipleComparision/Hochberg88.pdf | doi=10.1093/biomet/75.4.800}}</ref>
| |
| | |
| * Start by ordering the p-values <math>P_{(1)} \ldots P_{(m)}</math> and let the associated hypotheses be <math>H_{(1)} \ldots H_{(m)}</math>
| |
| | |
| * For a given <math>\alpha</math>, let <math>R</math> be the largest <math>k</math> such that <math>P_{(k)} \leq \frac{\alpha}{m+1-k}</math>
| |
| | |
| * Reject the null hypotheses <math>H_{(1)} \ldots H_{(R)}</math>
| |
| | |
| * Hochberg's procedure is more powerful than Holms'.
| |
| | |
| * Nevertheless, while Holm’s is based on Bonferroni with no restriction on the joint distribution of the test statistics, Hochberg’s is based on the [[False_discovery_rate#"Simes procedure"|Simes test (1987)]] so it holds only under independence (and also under some forms of positive dependence).
| |
| | |
| ===Dunnett's correction===
| |
| {{main|Dunnett's test}}
| |
| [[Charles Dunnett]] (1955, 1966; not to be confused with Dunn) described an alternative alpha error adjustment when ''k'' groups are compared to the same control group. Now known as [[Dunnett's test]], this method is less conservative than the Bonferroni adjustment.
| |
| | |
| ===Scheffé's method===
| |
| {{main|Scheffé's method}}
| |
| {{empty section|date=February 2013}}
| |
| | |
| ===Closed testing procedure===
| |
| {{main|Closed testing procedure}}
| |
| Closed testing procedures control the familywise type I error rate, if in the closed testing procedure all intersection hypotheses are tested using valid local level α tests. Closed testing procedures are a flexible general class of testing procedures that include e.g. the Bonferroni procedure or Holm's step-down procedure.
| |
| | |
| ===Other procedures===
| |
| Other advanced procedures that ensure strong level <math>\alpha</math> FWER control include the [[maximum modulus test]].
| |
| | |
| It should also be noted that there are many alternatives to the attempt to control the familywise error rate. Most notably is the [[false discovery rate]] which was invented by [[Yoav Benjamini|Benjamini]] and [[Yosef Hochberg|Hochberg]] in 1995, and address many of the large-scale inferences problems in a more practical way.
| |
| | |
| ==Example==
| |
| | |
| Consider a randomized clinical trial for a new antidepressant drug using three groups:
| |
| * Existing drug
| |
| * New drug
| |
| * Placebo
| |
| In such a design, the researcher might be interested in whether depressive symptoms (measured, for example, by a [[Beck Depression Inventory]] score) decreased to a greater extent for those using the new drug compared to the old drug. Further, one might be interested in whether any [[side effect]]s (e.g., [[hypersomnia]], decreased sex drive, and [[dry mouth]]) were observed. In such a case, '''two families''' would likely be identified:
| |
| # Effect of drug on depressive symptoms
| |
| # Occurrence of any side effects.
| |
| | |
| The researcher would assign an acceptable [[Type I and type II errors|Type I error]] rate, <math>\alpha</math>, (usually 0.05) to each family, and control for family-wise error using appropriate multiple comparison procedures:
| |
| * For the first family, effect of antidepressant on depressive symptoms, [[pairwise comparison]]s among groups might be jointly controlled using techniques such as [[Tukey's range test]]. [[Bonferroni correction]] might also suffice here since there are only three tests (three comparisons of depressive symptoms).
| |
| * In terms of the side effect profile, since we have three comparisons for each side effect, allowing each side effect its own alpha would result in a 37% chance of making at least one [[Type I and type II errors|Type I error]] (i.e.,(1 - (0.95)^9 = 1 - 0.63 = 0.37). Having a total of 9 hypotheses, the [[Bonferroni correction]] might be too conservative in this case; a more powerful tool such as [[Tukey's range test]] or the [[Holm-Bonferroni method]] will probably be more suitable: for example, the researcher may divide <math>\alpha</math> by three (0.05/3 = 0.0167) and allocate .0167 to each side effect multiple comparison procedure. In the case of [[Tukey's range test]], the critical value of q, the studentized range statistic, would thus be based on an <math>\alpha</math> value of 0.0167.
| |
| | |
| ==See also==
| |
| | |
| *[[False discovery rate]]
| |
| | |
| ==References==
| |
| | |
| <references />
| |
| | |
| ==External links==
| |
| *[http://www-stat.stanford.edu/~omkar/329/ Large-scale Simultaneous Inference] – Syllabus, notes, and homework from Efron's course at Stanford. Includes PDFs for each chapter of his book.
| |
| | |
| {{DEFAULTSORT:Familywise Error Rate}}
| |
| [[Category:Hypothesis testing]]
| |