3/14/23
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Sex Insensitive Model | ||||
(Intercept) | 23.495 | 0.809 | 29.04 | 0 |
education | -0.839 | 0.067 | -12.54 | 0 |
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Sex Insensitive Model | ||||
(Intercept) | 23.495 | 0.809 | 29.036 | 0 |
education | -0.839 | 0.067 | -12.540 | 0 |
Sex Sensitive Model | ||||
(Intercept) | -1.660 | 0.441 | -3.761 | 0 |
education | 0.674 | 0.028 | 23.732 | 0 |
sex(male) | 16.556 | 0.266 | 62.311 | 0 |
Dummy variables code each observation:
\(x_i=\left\{\begin{array}{c l} 1 & \text{if i in category} \\ 0 & \text{if i not in category} \end{array}\right.\)
with zero being the reference class.
For m categories, requires m-1 dummy variables.
Category | D1 | D2 | … | Dm-1 |
---|---|---|---|---|
D1 |
1 |
0 |
0 |
0 |
D2 |
0 |
1 |
0 |
0 |
â‹® |
â‹® |
â‹® |
â‹® |
â‹® |
Dm-1 |
0 |
0 |
0 |
1 |
Dm |
0 |
0 |
0 |
0 |
Dummy variables code each observation:
\(sex_i=\left\{\begin{array}{c l} 1 & \text{if i-th person is male} \\ 0 & \text{if i-th person is not male} \end{array}\right.\)
Here the reference class is female.
The sex variable has two categories, hence one dummy variable.
sex | male |
---|---|
male |
1 |
female |
0 |
â‹® |
â‹® |
male |
1 |
female |
0 |
Simple linear model | \(y_i = \beta_0 + \beta_1x_1 + \epsilon_i\) |
Add dummy \(D\) with \(\gamma\) offset | \(y_i = (\beta_0 + \gamma\cdot D) + \beta_1x_1 + \epsilon_i\) |
for a binary class: | |
Model for reference class | \(y_i = (\beta_0 + \gamma\cdot 0) + \beta_1x_1 + \epsilon_i\) |
Model for other class | \(y_i = (\beta_0 + \gamma\cdot 1) + \beta_1x_1 + \epsilon_i\) |
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Sex Sensitive Model | ||||
(Intercept) | 8.555 | 0.191 | 44.79 | 0 |
sex(male) | 11.165 | 0.270 | 41.33 | 0 |
\(H_{0}\): no difference in mean
model: \(income \sim sex\)
\(\bar{y}_{F} = 8.555\)
\(\bar{y}_{M} = \bar{y}_{F} + 11.165 = 19.72\)
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Uncentered Model | ||||
(Intercept) | 25.80 | 5.917 | 4.36 | 0 |
Mom's IQ | 0.61 | 0.059 | 10.42 | 0 |
Model child IQ as a function of mother’s IQ.
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Uncentered Model | ||||
(Intercept) | 25.80 | 5.917 | 4.36 | 0 |
Mom's IQ | 0.61 | 0.059 | 10.42 | 0 |
Model child IQ as a function of mother’s IQ.
Centering will make intercept more interpretable.
To center, subtract the mean:
\[\text{Mom's IQ} - \text{mean(Mom's IQ)}\]
That gets us this model…
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Uncentered Model | ||||
(Intercept) | 25.80 | 5.917 | 4.36 | 0 |
Mom's IQ | 0.61 | 0.059 | 10.42 | 0 |
Centered Model | ||||
(Intercept) | 86.80 | 0.877 | 98.99 | 0 |
Mom's IQ | 0.61 | 0.059 | 10.42 | 0 |
Now we interpret the intercept as expected IQ of a child for a mother with mean IQ. (Notice change in standard error!)
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Unscaled Model | ||||
(Intercept) | 37.227 | 1.599 | 23.285 | 0.000 |
Horsepower | -0.032 | 0.009 | -3.519 | 0.001 |
Weight | -3.878 | 0.633 | -6.129 | 0.000 |
Model fuel efficiency as a function of weight and horse power.
Question Which covariate has a bigger effect?
Problem Cannot directly compare coefficients on different scales.
Solution Convert variable values to their z-scores:
\[z_i = \frac{x_i-\bar{x}}{\sigma_{x}}\]
All coefficients now give change in \(y\) for 1\(\sigma\) change in \(x\).
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Unscaled Model | ||||
(Intercept) | 37.227 | 1.599 | 23.285 | 0.000 |
Horsepower | -0.032 | 0.009 | -3.519 | 0.001 |
Weight | -3.878 | 0.633 | -6.129 | 0.000 |
Scaled Model | ||||
(Intercept) | 20.091 | 0.458 | 43.822 | 0.000 |
Horsepower | -2.178 | 0.619 | -3.519 | 0.001 |
Weight | -3.794 | 0.619 | -6.129 | 0.000 |
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Additive Model | ||||
(Intercept) | 86.80 | 0.877 | 98.99 | 0 |
Mom's IQ | 0.61 | 0.059 | 10.42 | 0 |
Question What if the relationship between child’s IQ and mother’s IQ depends on the mother’s educational attainment?
Simple formula becomes:
\[y_i = (\beta_0 + \gamma D) + (\beta_1x_1 + \omega D x_1) + \epsilon_i\]
With
- \(\gamma\) giving the change in \(\beta_0\) and
- \(\omega\) giving the change in \(\beta_1\).
That’s a change in intercept and a change in slope!
term | estimate | std.error | t.statistic | p.value |
---|---|---|---|---|
Additive Model | ||||
(Intercept) | 86.797 | 0.877 | 98.993 | 0.000 |
Mom's IQ | 0.610 | 0.059 | 10.423 | 0.000 |
Interactive Model | ||||
(Intercept) | 85.407 | 2.218 | 38.502 | 0.000 |
Mom's IQ | 0.969 | 0.148 | 6.531 | 0.000 |
High School | 2.841 | 2.427 | 1.171 | 0.242 |
Mom's IQ x HS | -0.484 | 0.162 | -2.985 | 0.003 |
Simple Linear Model: \(y_{i} = \beta_{0} + \beta_{1}X + \epsilon_{i}\)
\(R^2 = 0.3088\)
Quadratic Model: \(y_{i} = \beta_{0} + \beta_{1}X + \beta_{2}X^{2} + \epsilon_{i}\)
\(R^2=0.7636\)
Logging a skewed variable normalizes it.
This is the inverse of exponentiation:
\[Y = log(exp(Y))\]
May be applied to \(X\) and \(Y\).
Question: but, whyyyyyy???
Simple Linear Model: \(y_{i} = \beta_{0} + \beta_{1}X + \epsilon_{i}\)
\(R^2 = 0.8325\)
Log Linear Model: \(log(y_{i}) = \beta_{0} + \beta_{1}X + \epsilon_{i}\)
\(R^2 = 0.9407\)
Have to take care how we interpret log-models!
\(\beta_1\) means…
Linear Model
\(y_{i} = \beta_{0} + \beta_{1}X + \epsilon_{i}\)
Increase in Y for one unit increase in X.
Log-Linear Model
\(log(y_{i}) = \beta_{0} + \beta_{1}X + \epsilon_{i}\)
Percent increase in Y for one unit increase in X.
Linear-Log Model
\(y_{i} = \beta_{0} + \beta_{1}log(X) + \epsilon_{i}\)
Increase in Y for percent increase in X.
Log-Log Model
\(log(y_{i}) = \beta_{0} + \beta_{1}log(X) + \epsilon_{i}\)
Percent increase in Y for percent increase in X.