Logical assertion: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Hyacinth
en>Dwellee
m added link to 'premise' keyword
 
Line 1: Line 1:
In [[statistics]] and [[econometrics]], particularly in [[regression analysis]], a '''dummy variable''' (also known as an '''indicator variable''', '''design variable''', '''Boolean indicator''', '''categorical variable''', '''binary variable''', or '''qualitative variable'''<ref name="G & S"/><ref name=Gujarati/>) is one that takes the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome.<ref>Draper, N.R.; Smith, H. (1998) ''Applied Regression Analysis'', Wiley. ISBN 0-471-17082-8 (Chapter 14)</ref><ref name="Interpreting Coefficients">{{cite web|title=Interpreting the Coefficients on Dummy Variables|url=http://users.rcn.com/alancm/pp605/Interpreting_Dummy_Coefficients.pdf}}</ref> Dummy variables are used as devices to sort data into [[Mutually exclusive events|mutually exclusive]] categories (such smoker/non-smoker, etc.).<ref name=Gujarati>{{cite book|last=Gujarati|first=Damodar N|title=Basic econometrics|year=2003|publisher=McGraw Hill|isbn=0-07-233542-4|pages=1002|url=http://www.mhhe.com/gujarati4e}}</ref> For example, in [[econometrics|econometric]] [[time series analysis]], dummy variables may be used to indicate the occurrence of wars or major [[Strike action|strikes]]. A dummy variable can thus be thought of as a [[truth value]] represented as a numerical value 0 or 1 (as is sometimes done in computer programming).
Are you aware significant pollutants around you every day and in all of every place you go? Many people are not. People need to be more aware of the ways to make sure they remain as healthy as possible. This is the basis quite a few companies that sell environmentally friendly products. Their mission through using spread term about clean living along with sell environmentally friendly products that will make this more possible than ever. These kinds of companies are ever-increasing due to such levels of environmental one more.<br><br>


Dummy variables are "proxy" variables or numeric stand-ins for [[Qualitative data|qualitative]] facts in a [[Regression analysis|regression model]]. In regression analysis, the [[Dependent and independent variables|dependent variables]] may be influenced not only by quantitative variables (income, output, prices, etc.), but also by qualitative variables (gender, religion, geographic region, etc.). A dummy [[Dependent and independent variables|independent variable]] (also called a dummy explanatory variable) which for some observation has a value of 0 will cause that variable's [[coefficient]] to have no role in influencing the [[Dependent and independent variables|dependent variable]], while when the dummy takes on a value 1 its coefficient acts to alter the intercept. For example, suppose Gender is one of the qualitative variables relevant to a regression. Then, female and male would be the categories included under the Gender variable. If female is arbitrarily assigned the value of 1, then male would get the value 0.<ref name="G & S">{{cite web|last=, Asha Sharma|first=Susan Garavaglia|title=A SMART GUIDE TO DUMMY VARIABLES: FOUR APPLICATIONS AND A MACRO|url=http://www.ats.ucla.edu/stat/sas/library/nesug98/p046.pdf}}</ref> Then the intercept (the value of the dependent variable if all other explanatory variables hypothetically took on the value zero) would be the constant term for males but would be the constant term plus the coefficient of the gender dummy in the case of females.
<br><br>Some stores carry narrow shoes, but the chain shoe stores and discount stores usually tend not to. Specialty clothing stores that sell clothing specifically for petite women often carry shoes, however their selection is typically very limited. What is females with narrow feet for you to do? Well, many a few options an individual can try to find comfy you want, in the narrow sizes that you may.<br><br>Learning a new skill is difficult for children with PDD-NOS. Their developmental skills challenge them, but right before don't like change. This can include a bunch of things pertaining to example potty training, brushing their teeth or combing their head of hair. Because they are incredibly into routine and doing the same thing, to be able to learn something mroe challenging will change their daily activities.<br><br>Since Egypt has a hot weather, you really should bear goal that the most important detail a good way to to avoid dehydration. To this goal, it is wise to carry lots of bottled fluids.<br><br>Building a building is a quaint venture. Something like growing tomatoes in the small garden or, are going to is an extra-large house, working with a kid. Having built home is the kind of thing that a person be satisfied with for but beyond of their life. Despite all that other stuff, I built this contain.<br><br>These small projects includes insulation of pipes, fixation of leaks and replacing the valves. There instantly tools tend to be must haves in the comprehensive tool resource which are wrenches, tapes, caulks, bolts and nuts etc. These will also come handy a great deal more have some other problem to repair up.<br><br>A national day off is that you simply and appears like that . Patrick's Day couldn't come any a lot faster. Annually anticipated, March 17 is marked in every calendar to be a day chill out and remember.<br><br>When we get back together with an ex, we usually only have one chance to make it work. Now you aren't able to get a girlfriend back after break up, and you can do it.<br><br>For more info about [http://usmerch.co.uk/ gas monkey garage] review the web site.
 
Dummy variables are used frequently in [[time series analysis]] with regime switching, seasonal analysis and  qualitative data applications. Dummy variables are involved in studies for [[economic forecasting]], bio-medical studies, [[Credit score|credit scoring]], response modelling, etc. Dummy variables may be incorporated in traditional regression methods or newly developed modeling paradigms.<ref name="G & S"/>
 
==Incorporating a dummy independent variable==
[[File:Graph showing Wage = α0 + δ0female + α1education + U, δ0 0.jpg|thumb|right|400px|400px|Figure 1 : Graph showing wage = α<sub>0</sub> + δ<sub>0</sub>female + α<sub>1</sub>education + ''U'', δ<sub>0</sub>&nbsp;<&nbsp;0.]]
 
Dummy variables are incorporated in the same way as quantitative variables are included (as explanatory variables) in regression models. For example, if we consider a regression model of wage determination, wherein wages are dependent on gender (qualitative) and years of education (quantitative):
 
:'''Wage = α<sub>0</sub> + δ<sub>0</sub>female + α<sub>1</sub>education + U'''
 
In the model, ''female'' = 1 when the person is a female and ''female'' = 0 when the person is male. δ<sub>0</sub> can be interpreted as: the difference in wages between females and males, keeping education and the [[Errors and residuals in statistics|error term]] 'U' constant. Thus, δ<sub>0</sub> helps to determine whether there is a discrimination in wages between men and women. If δ<sub>0</sub><0 (negative coefficient), then for the same level of education (and other factors influencing wages), women earn a lower wage than men. On the other hand, if δ<sub>0</sub>>0 (positive coefficient), then women earn a higher wage than men (keeping other factors constant). Note that the coefficients attached to the dummy variables are called '''differential intercept coefficients'''.
The model can be depicted graphically as an intercept shift between females and males. In the figure, the case δ<sub>0</sub><0 is shown (wherein, men earn a higher wage than women).<ref name=Wooldridge>{{cite book|last=Wooldridge|first=Jeffrey M|title=Introductory econometrics: a modern approach|year=2009|publisher=Cengage Learning|isbn=0-324-58162-9|pages=865|url=http://books.google.com/books?id=64vt5TDBNLwC&dq=introductory+econometrics+wooldridge}}</ref>
 
Dummy variables may be extended to more complex cases. For example, seasonal effects may be captured by creating dummy variables for each of the seasons: D1=1 if the observation is for summer, and equals zero otherwise; D2=1 if and only if autumn, otherwise equals zero; D3=1 if and only if winter, otherwise equals zero; and D4=1 if and only if spring, otherwise equals zero. In the [[panel data]] [[fixed effects estimator]] dummies are created for each of the units in [[cross-sectional data]] (e.g. firms or countries) or periods in a [[pooled time-series]]. However in such regressions either the [[constant term]] has to be removed, or one of the dummies removed making this the base category against which the others are assessed, for the following reason:
 
A precaution needs to be taken while using dummy variables for calculating the regression coefficients. The constant terms in all the regression equations will obviously have a coefficient of 1 (since they are independent of all the variable terms). When the regression is expressed as a matrix equation, the columns of the coefficient matrix will be linearly dependent. In fact, the column rank of the matrix is reduced by 1 for every categorical variable. As result, the regression equation will be unsolvable-even by the typical pseudoinverse method. In other words: if the vector-of-ones variable were also present, this would result in perfect [[multicollinearity]],<ref>{{cite journal|first=Daniel B.|last=Suits|year=1957|title=Use of Dummy Variables in Regression Equations|jstor=2281705|journal=Journal of the American Statistical Association|volume=52|issue=280|pages=548–551}}</ref> so that the matrix inversion in the estimation algorithm would be impossible. This is referred to as the '''dummy variable trap'''. The solution is to drop one term from the equation for each set of dummy variables representing a categorical variable.
 
==ANOVA models==
 
{{Main|Analysis of variance}}
 
A regression model in which the dependent variable is quantitative in nature but all the explanatory variables are dummies (qualitative in nature) is called an ''Analysis of Variance'' (ANOVA) model.<ref name=Gujarati/>
 
===ANOVA model with one qualitative variable===
 
Suppose we want to run a regression to find out if the average annual salary of public school teachers differs among three geographical regions in Country A with 51 states: (1) North (21 states) (2) South (17 states) (3) West (13 states). Say that the simple arithmetic average salaries are as follows: $24,424.14 (North), $22,894 (South), $26,158.62 (West). The arithmetic averages are different, but are they statistically different from each other? To compare the mean values, [[Analysis of variance|Analysis of Variance]] techniques can be used.
The regression model can be defined as:
 
: ''Y''<sub>''i''</sub> = α<sub>1</sub> + α<sub>2</sub>''D''<sub>2''i''</sub> + α<sub>3</sub>''D''<sub>3''i''</sub> + ''U''<sub>''i''</sub>,
 
where
 
: ''Y''<sub>''i''</sub> = average annual salary of public school teachers in state i
: ''D''<sub>2''i''</sub> = 1 if the state ''i'' is in the North Region
:: ''D''<sub>2''i''</sub> = 0 otherwise (any region other than North)
: ''D''<sub>3''i''</sub> = 1 if the state ''i'' is in the South Region
:: ''D''<sub>3''i''</sub> = 0 otherwise
 
In this model, we have only qualitative regressors, taking the value of 1 if the observation belongs to a specific category and 0 if it belongs to any other category. This makes it an ANOVA model.
 
[[File:Anova graph.jpg|thumb|left|400px|400px|Figure 2 : Graph showing the regression results of the ANOVA model example: Average annual salaries of public school teachers in 3 regions of Country A.]]
 
Now, taking the [[Expected value|expectation]] of both sides, we obtain the following:
 
Mean salary of public school teachers in the North Region:
 
'''E(''Y''<sub>''i''</sub>|''D''<sub>2''i''</sub> = 1, ''D''<sub>3''i''</sub> = 0) = α<sub>1</sub> + α<sub>2</sub>'''
 
Mean salary of public school teachers in the South Region:
 
'''E(Y<sub>i</sub>|D<sub>2i</sub> = 0, D<sub>3i</sub> = 1) = α<sub>1</sub> + α<sub>3</sub>'''
 
Mean salary of public school teachers in the West Region:
 
'''E(Y<sub>i</sub>|D<sub>2i</sub> = 0, D<sub>3i</sub> = 0) = α<sub>1</sub> '''
 
(The error term does not get included in the expectation values as it is assumed that it satisfies the usual [[Least squares|OLS]] conditions, i.e., E(U<sub>i</sub>) = 0)
 
The expected values can be interpreted as follows: The mean salary of public school teachers in the West is equal to the intercept term α<sub>1</sub> in the multiple regression equation and the differential intercept coefficients, α<sub>2</sub> and α<sub>3</sub>, explain by how much the mean salaries of teachers in the North and South Regions vary from that of the teachers in the West. Thus, the mean salaries of teachers in the North and South is ''compared'' against the mean salary of the teachers in the West. Hence, the West Region becomes the '''base group''' or the '''benchmark group''',i.e., the group against which the comparisons are made. The '''omitted category''', i.e., the category to which no dummy is assigned, is taken as the base group category.
 
Using the given data, the result of the regression would be:
 
: ''Ŷ''<sub>''i''</sub> = 26,158.62 &minus; 1734.473D<sub>2''i''</sub> &minus; 3264.615D<sub>3''i''</sub>
 
se =          (1128.523)  (1435.953)              (1499.615)
 
t  =            (23.1759)    (&minus;1.2078)                (&minus;2.1776)
 
p  =            (0.0000)    (0.2330)                (0.0349)
 
R<sup>2</sup> = 0.0901
 
where, se = [[Standard error (statistics)|standard error]], ''t'' = [[t-statistic]]s, ''p'' = [[p value]]
 
The regression result can be interpreted as: The mean salary of the teachers in the West (base group) is about $26,158, the salary of the teachers in the North is lower by about $1734 ($26,158.62 &minus; $1734.473 = $24.424.14, which is the average salary of the teachers in the North) and that of the teachers in the South is lower by about $3265 ($26,158.62 &minus; $3264.615 = $22,894, which is the average salary of the teachers in the South).
 
To find out if the mean salaries of the teachers in the North and South are statistically different from that of the teachers in the West (the comparison category), we have to find out if the slope coefficients of the regression result are [[Statistical significance|statistically significant]]. For this, we need to consider the ''p'' values. The estimated slope coefficient for the North is not statistically significant as its ''p'' value is 23 percent; however, that of the South is statistically significant at the 5% level as its ''p'' value is only around 3.5 percent. Thus the overall result is that the mean salaries of the teachers in the West and North are not statistically different from each other, but the mean salary of the teachers in the South is statistically lower than that in the West by around $3265. The model is diagrammatically shown in Figure 2. This model is an ANOVA model with one qualitative variable having 3 categories.<ref name=Gujarati/>
 
===ANOVA model with two qualitative variables===
 
Suppose we consider an ANOVA model having two qualitative variables, each with two categories: Hourly Wages  are to be explained in terms of the qualitative variables Marital Status (Married / Unmarried) and Geographical Region (North / Non-North). Here, Marital Status and Geographical Region are the two explanatory dummy variables.<ref name=Gujarati/>
 
Say the regression output on the basis of some given data appears as follows:
 
:'''Ŷ<sub>i</sub> = 8.8148 + 1.0997D<sub>2</sub> &minus; 1.6729D<sub>3</sub>'''
 
where,
 
:''Y'' = hourly wages (in $)
 
:''D''<sub>2</sub> = marital status, 1 = married, 0 = otherwise
 
:''D''<sub>3</sub> = geographical region, 1 = North, 0 = otherwise
 
In this model, a single dummy is assigned to each qualitative variable, one less than the number of categories included in each.
 
Here, the base group is the omitted category: Unmarried, Non-North region (Unmarried people who do not live in the North region). All comparisons would be made in relation to this base group or omitted category. The mean hourly wage in the base category is about $8.81 (intercept term). In comparison, the mean hourly wage of those who are married is higher by about $1.10 and is equal to about $9.91 ($8.81 + $1.10). In contrast, the mean hourly wage of those who live in the North is lower by about $1.67 and is about $7.14 ($8.81 &minus; $1.67).
 
Thus, if more than one qualitative variable is included in the regression, it is important to note that the omitted category should be chosen as the benchmark category and all comparisons will be made in relation to that category. The intercept term will show the expectation of the benchmark category and the slope coefficients will show by how much the other categories differ from the benchmark (omitted) category.<ref name=Gujarati/>
 
==ANCOVA models==
 
{{Main|Analysis of covariance}}
 
A regression model that contains a mixture of both quantitative and qualitative variables is called an ''[[Analysis of covariance|Analysis of Covariance]]'' (ANCOVA) model. ANCOVA models are extensions of ANOVA models. They are statistically control for the effects of quantitative explanatory variables (also called covariates or control variables).<ref name=Gujarati/>
 
To illustrate how qualitative and quantitative regressors are included to form ANCOVA models, suppose we consider the same example used in the ANOVA model with one qualitative variable: average annual salary of public school teachers in three geographical regions of Country A. If we include a quantitative variable, ''State Government expenditure on public schools per pupil'', in this regression, we get the following model:
 
[[File:Ancova graph.jpg|thumb|right|400px|400px|Figure 3 : Graph showing the regression results of the ANCOVA model example: Public school teacher's salary (Y) in relation to State expenditure per pupil on public schools.]]
 
:'''Y<sub>i</sub> = α<sub>1</sub> + α<sub>2</sub>D<sub>2i</sub> + α<sub>3</sub>D<sub>3i</sub> + α<sub>4</sub>X<sub>i</sub> + U<sub>i</sub>'''
 
where,
 
:Y<sub>i</sub> = average annual salary of public school teachers in state i
 
:X<sub>i</sub> = State expenditure on public schools per pupil
 
:D<sub>2i</sub> = 1, if the State i is in the North Region
 
::D<sub>2i</sub> = 0, otherwise
 
:D<sub>3i</sub> = 1, if the State i is in the South Region
 
::D<sub>3i</sub> = 0, otherwise
 
Say the regression output for this model is
 
:'''Ŷ<sub>i</sub> = 13,269.11 &minus; 1673.514D<sub>2i</sub> &minus; 1144.157D<sub>3i</sub> + 3.2889X<sub>i</sub>'''
 
The result suggests that, for every $1 increase in State expenditure per pupil on public schools, a public school teacher's average salary goes up by about $3.29. Further, for a state in the North region, the mean salary of the teachers is lower than that of West region by about $1673 and for a state in the South region, the mean salary of teachers is lower than that of the West region by about $1144. Figure 3 depicts this model diagrammatically. The average salary lines are parallel to each other by the assumption of the model that the coefficient of expenditure does not vary by state. The trade off shown separately in the graph for each category is between the two quantitative variables: public school teachers' salaries (Y) in relation to State expenditure per pupil on public schools (X).<ref name=Gujarati/>
 
==Interactions among dummy variables==
 
Quantitative regressors in regression models often have an [[Interaction (statistics)|interaction]] among each other. In the same way, qualitative regressors, or dummies, can also have interaction effects between each other, and these interactions can be depicted in the regression model. For example,in a regression involving determination of wages, if two qualitative variables are considered, namely, gender and marital status, there could be an interaction between marital status and gender.<ref name=Wooldridge/>  These interactions can be shown in the regression equation as illustrated by the example below.
 
With the two qualitative variables being gender and marital status and with the quantitative explanator being years of education, a regression that is purely linear in the explanators would be
 
:'''Y<sub>i</sub> = β<sub>1</sub> + β<sub>2</sub>D<sub>2,i</sub> + β<sub>3</sub>D<sub>3,i</sub> + αX<sub>i</sub> + U<sub>i</sub>'''
 
where
 
:i denotes the particular individual
 
:Y = Hourly Wages (in $)
 
:X = Years of education
 
:D<sub>2</sub> = 1 if female, 0 otherwise
 
:D<sub>3</sub> = 1 if married, 0 otherwise
 
This specification does not allow for the possibility that there may be an interaction that occurs between the two qualitative variables, D<sub>2</sub> and D<sub>3</sub>. For example, a female who is married may earn wages that differ from those of an unmarried male by an amount that is not the same as the sum of the differentials for solely being female and solely being married. Then the effect of the interacting dummies on the mean of Y is not simply ''additive'' as in the case of the above specification, but ''multiplicative'' also, and the determination of wages can be specified as:
 
:'''Y<sub>i</sub> = β<sub>1</sub> + β<sub>2</sub>D<sub>2,i</sub> + β<sub>3</sub>D<sub>3,i</sub> + β<sub>4</sub>(D<sub>2,i</sub>D<sub>3,i</sub>) + αX<sub>i</sub> + U<sub>i</sub>'''
 
Here,
 
:β<sub>2</sub> = differential effect of being a female
 
:β<sub>3</sub> = differential effect of being married
 
:β<sub>4</sub> = further differential effect of being ''both'' female ''and'' married
 
By this equation, in the absence of a non-zero error the wage of an unmarried male is β<sub>1</sub>+ αX<sub>i</sub>, that of an unmarried female is β<sub>1</sub>+ β<sub>2</sub> + αX<sub>i</sub>, that of being a married male is β<sub>1</sub>+ β<sub>3</sub> + αX<sub>i</sub>, and that of being a married female is β<sub>1</sub>+β<sub>2</sub>+ β<sub>3</sub> + β<sub>4</sub>+ αX<sub>i</sub> (where any of the estimates of the coefficients of the dummies could turn out to be positive, zero, or negative).
 
Thus, an interaction dummy (product of two dummies) can alter the dependent variable from the value that it gets when the two dummies are considered individually.<ref name=Gujarati/>
 
However, the use of products of dummy variables to capture interactions can be avoided by using a different scheme for categorizing the data&mdash;one that specifies categories in terms of combinations of characteristics. If we let
 
:D<sub>4</sub> = 1 if unmarried female, 0 otherwise
:D<sub>5</sub> = 1 if married male, 0 otherwise
:D<sub>6</sub> = 1 if married female, 0 otherwise
 
then it suffices to specify the regression
 
:'''Y<sub>i</sub> = δ<sub>1</sub> + δ<sub>4</sub>D<sub>4,i</sub> + δ<sub>5</sub>D<sub>5,i</sub> + δ<sub>6</sub>D<sub>6,i</sub> + αX<sub>i</sub> + U<sub>i</sub>.'''
 
Then with zero shock term the value of the dependent variable is δ<sub>1</sub>+ αX<sub>i</sub> for the base category unmarried males, δ<sub>1</sub> + δ<sub>4</sub>+ αX<sub>i</sub> for unmarried females, δ<sub>1</sub> + δ<sub>5</sub>+ αX<sub>i</sub> for married males, and δ<sub>1</sub> + δ<sub>6</sub>+ αX<sub>i</sub> for married females. This specification involves the same number of right-side variables as does the previous specification with an interaction term, and the regression results for the predicted value of the dependent variable contingent on X<sub>i</sub>, for any combination of qualitative traits, are identical between this specification and the interaction specification.
 
==Dummy dependent variables==
 
===What happens if the dependent variable is a dummy?===
 
A model with a dummy dependent variable (also known as a qualitative dependent variable)  is one in which the dependent variable, as influenced by the explanatory variables, is qualitative in nature. Some decisions regarding 'how much' of an act must be performed involve a prior decision making on whether to perform the act or not. For example, the amount of output to produce, the cost to be incurred, etc. involve prior decisions on whether to produce or not, whether to spend or not, etc. Such "prior decisions" become dependent dummies in the regression model.<ref name=Wabash>{{cite book|first1=Humberto|last1=Barreto|first2= Frank|last2=Howland |title=Introductory Econometrics: Using Monte Carlo Simulation with Microsoft Excel|chapter=Chapter 22: Dummy Dependent Variable Models|url=http://www3.wabash.edu/econometrics/EconometricsBook/chap22.htm|isbn=0-521-84319-7|year=2005|publisher=Cambridge University Press}}</ref>
 
For example, the decision of a worker to be a part of the labour force becomes a dummy dependent variable. The decision is [[Dichotomy|dichotomous]], i.e., the decision has two possible outcomes: yes and no. So the dependent dummy variable Participation would take on the value 1 if participating, 0 if not participating.<ref name=Gujarati/> Some other examples of dichotomous dependent dummies are cited below:
 
'''Decision:''' Choice of Occupation.      '''Dependent Dummy:''' Supervisory = 1 if supervisor, 0 if not supervisor.
 
'''Decision:''' Affiliation to a Political Party.      '''Dependent Dummy:''' Affiliation = 1 if affiliated to the party, 0 if not affiliated.
 
'''Decision:''' Retirement.      '''Dependent Dummy:''' Retired = 1 if retired, 0 if not retired.
 
When the qualitative dependent dummy variable has more than two values (such as affiliation to many political parties), it becomes a multiresponse or a multinomial or [[Polychotomy|polychotomous]] model.<ref name=Wabash/>
 
===Dependent dummy variable models===
 
Analysis of dependent dummy variable models can be done through different methods. One such method is the usual [[Least squares|OLS]] method, which in this context is called the [[linear probability model]]. An alternative method is to assume that there is an unobservable continuous latent variable Y<sup>*</sup> and that the observed dichotomous variable Y = 1 if Y<sup>*</sup> > 0, 0 otherwise. This is the underlying concept of the [[Logistic regression|logit]] and [[Probit model|probit]] models. These models are discussed in brief below.<ref name=Maddala>{{cite book|last=Maddala|first=G S|title=Introduction to econometrics|year=1992|publisher=Macmillan Pub. Co.|isbn=0-02-374545-2|pages=631|url=http://books.google.com/books?id=nBS3AAAAIAAJ&dq=introduction%20to%20econometrics%20maddala}}</ref>
 
====Linear probability model====
 
{{Main|Linear probability model}}
 
An ordinary least squares model in which the dependent variable ''Y'' is a dichotomous dummy, taking the values of 0 and 1, is the [[linear probability model]] (LPM).<ref name=Maddala/> Suppose we consider the following regression:
 
: ''Y''<sub>''i''</sub> = α<sub>1</sub> + α<sub>2</sub>''X''<sub>''i''</sub> + ''U''<sub>''i''</sub>
 
where
 
:''X'' = family income
 
:''Y'' = 1 if a house is owned by the family, 0 if a house is not owned by the family
 
The model is called the ''linear probability model'' because, the regression is linear. The [[Conditional expectation|conditional mean]] of Y<sub>i</sub> given X<sub>i</sub>, written as E(''Y''<sub>''i''</sub>|''X''<sub>''i''</sub>), is interpreted as the [[conditional probability]] that the event will occur for that value of ''X''<sub>''i''</sub> &mdash; that is, Pr(''Y''<sub>''i''</sub> = 1 |''X''<sub>''i''</sub>). In this example, E(''Y''<sub>''i''</sub>|''X''<sub>''i''</sub>)gives the probability of a house being owned by a family whose income is given by ''X''<sub>''i''</sub>.
 
Now, using the [[Least squares|OLS]] assumption E(''U''<sub>''i''</sub>) = 0, we get
 
: E(''Y''<sub>''i''</sub>|''X''<sub>''i''</sub>) = α<sub>1</sub> + α<sub>2</sub>''X''<sub>''i''</sub>
 
Some problems are inherent in the LPM model:
 
1. The regression line will not be a [[Goodness of fit|well-fitted]] one and hence measures of significance, such as R<sup>2</sup>, will not be reliable.
 
2. Models that are analyzed using the LPM approach will have [[Heteroscedasticity|heteroscedastic]] disturbances.
 
3. The error term will have a non-normal distribution.
 
4. The LPM may give predicted values of the dependent variable that are greater than 1 or less than 0. This will be difficult to interpret as the predicted values are intended to be probabilities, which must lie between 0 and 1.
 
5. There might exist a non-linear relationship between the variables of the LPM model, in which case, the linear regression will not fit the data accurately.<ref name=Gujarati/><ref name=DD>Adnan Kasman, {{cite web|title=Dummy Dependent Variable Models|url=http://kisi.deu.edu.tr/evrim.gursoy/Dummy_Dependent_Variables_Models.doc}}. Lecture Notes</ref>
 
====Alternatives to LPM====
 
[[File:CDF graph.jpg|thumb|right|400px|400px|Figure 4 : A cumulative distribution function.]]
 
To avoid the limitations of the LPM, what is needed is a model that has the feature that as the explanatory variable, ''X''<sub>''i''</sub>, increases, ''P''<sub>''i''</sub> = E (''Y''<sub>''i''</sub> = 1 | ''X''<sub>''i''</sub>) should remain within the range between 0 and 1. Thus the relationship between the independent and dependent variables is necessarily non-linear.
 
For this purpose, a [[cumulative distribution function]] (CDF) can be used to estimate the dependent dummy variable regression. Figure 4 shows an 'S'-shaped curve, which resembles the CDF of a random variable. In this model, the probability is between 0 and 1 and the non-linearity has been captured. The choice of the CDF to be used is now the question.
 
Two alternative CDFs can be used: the [[logistic distribution|logistic]] and [[Normal distribution|normal]] CDFs. The logistic CDF gives rise to the [[Logistic regression|logit model]] and the normal CDF give rises to the [[probit model]]
.<ref name=Gujarati/>
 
====Logit model====
 
{{Main|Logistic regression}}
 
The shortcomings of the LPM led to the development of a more refined and improved model called the logit model. In the logit model, the cumulative distribution of the error term in the regression equation is logistic.<ref name=Maddala/> The regression is more realistic in that it is non-linear.
 
The logit model is estimated using the [[Maximum likelihood|maximum likelihood approach]]. In this model, P(''Y'' = 1 | ''X''), which is the probability of the dependent variable taking the value of 1 given the independent variable is:
 
: <math>P_i = \frac{1}{1 + e^{-z_i}}\ = \frac{e^{z_i}}{1 + e^{z_i}}\ </math>
 
where ''z''<sub>''i''</sub> = α<sub>1</sub> + α<sub>2</sub>''X''<sub>''i''.</sub>
 
The model is then expressed in the form of the [[odds]] ratio: what is modeled in the logistic regression is the natural logarithm of the odds, the odds being defined as P/(1-P). Taking the natural log of the odds, the logit (''L''<sub>''i''</sub>) is expressed as
 
: <math>L_i = \ln\left(\frac{P_i}{1 - P_i}\right) = z_i = \alpha_1 + \alpha_2 X_i.</math>
 
This relationship shows that ''L''<sub>''i''</sub> is linear in relation to ''X''<sub>''i''</sub>, but the probabilities are not linear in terms of ''X''<sub>''i''</sub>.<ref name=DD/>
 
====Probit model====
 
{{Main|Probit model}}
 
Another model that was developed to offset the disadvantages of the LPM is the probit model. The probit model uses the same approach to non-linearity as does the logit model; however, it uses the normal CDF instead of the logistic CDF.<ref name=Maddala/>
 
==See also==
 
* [[Chow test]]
* [[Statistical hypothesis testing|Hypothesis testing]]
* [[Indicator function]]
* [[Linear discriminant analysis|Linear discriminant function]]
* [[Multicollinearity]]
* [[Tobit model]]
 
==References==
 
{{Reflist}}
 
==External links==
* http://www.stat.yale.edu/Courses/1997-98/101/anovareg.htm
* http://udel.edu/~mcdonald/statancova.html
* http://stat.ethz.ch/~maathuis/teaching/stat423/handouts/Chapter7.pdf
* http://socserv.mcmaster.ca/jfox/Courses/SPIDA/dummy-regression-notes.pdf
* http://hspm.sph.sc.edu/courses/J716/pdf/716-6%20Dummy%20Variables%20and%20Time%20Series.pdf
 
{{DEFAULTSORT:Dummy Variable (Statistics)}}
[[Category:Econometrics]]
[[Category:Regression analysis]]
[[Category:Statistical models]]
[[Category:Mathematical and quantitative methods (economics)]]

Latest revision as of 15:25, 26 August 2014

Are you aware significant pollutants around you every day and in all of every place you go? Many people are not. People need to be more aware of the ways to make sure they remain as healthy as possible. This is the basis quite a few companies that sell environmentally friendly products. Their mission through using spread term about clean living along with sell environmentally friendly products that will make this more possible than ever. These kinds of companies are ever-increasing due to such levels of environmental one more.



Some stores carry narrow shoes, but the chain shoe stores and discount stores usually tend not to. Specialty clothing stores that sell clothing specifically for petite women often carry shoes, however their selection is typically very limited. What is females with narrow feet for you to do? Well, many a few options an individual can try to find comfy you want, in the narrow sizes that you may.

Learning a new skill is difficult for children with PDD-NOS. Their developmental skills challenge them, but right before don't like change. This can include a bunch of things pertaining to example potty training, brushing their teeth or combing their head of hair. Because they are incredibly into routine and doing the same thing, to be able to learn something mroe challenging will change their daily activities.

Since Egypt has a hot weather, you really should bear goal that the most important detail a good way to to avoid dehydration. To this goal, it is wise to carry lots of bottled fluids.

Building a building is a quaint venture. Something like growing tomatoes in the small garden or, are going to is an extra-large house, working with a kid. Having built home is the kind of thing that a person be satisfied with for but beyond of their life. Despite all that other stuff, I built this contain.

These small projects includes insulation of pipes, fixation of leaks and replacing the valves. There instantly tools tend to be must haves in the comprehensive tool resource which are wrenches, tapes, caulks, bolts and nuts etc. These will also come handy a great deal more have some other problem to repair up.

A national day off is that you simply and appears like that . Patrick's Day couldn't come any a lot faster. Annually anticipated, March 17 is marked in every calendar to be a day chill out and remember.

When we get back together with an ex, we usually only have one chance to make it work. Now you aren't able to get a girlfriend back after break up, and you can do it.

For more info about gas monkey garage review the web site.