adjusted_r_squared
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| adjusted_r_squared [2016/05/11 07:17] – created hkimscil | adjusted_r_squared [2016/05/11 07:48] (current) – hkimscil | ||
|---|---|---|---|
| Line 13: | Line 13: | ||
| | __Model Summary(b)__ | | __Model Summary(b)__ | ||
| - | | Model | R | R Square | + | | Model | R | R \\ Square |
| | 1 | 0.903696114 | | 1 | 0.903696114 | ||
| <WRAP clear /> | <WRAP clear /> | ||
| - | | + | **__r-square:__** |
| - | | + | * $\displaystyle |
| - | | + | * $\displaystyle |
| - | | + | * Usually interpret with % ( by multiplying 100 to $r^2$ ) |
| - | | + | **__Adjusted |
| - | | + | * $\displaystyle |
| - | | + | * This is equivalent to: $ \displaystyle |
| - | | + | * $\text{Var} = \text{MS} = s^{2} = \displaystyle \frac {SS}{n} $ |
| - | | + | * 여기서 |
| - | * for Var< | + | * $\displaystyle Var_{res} = \frac {SS_{res}}{n-p-1}$ |
| - | * for Var< | + | * $\displaystyle Var_{total} = \frac {SS_{total}}{n-1}$ |
| - | * This is the same logic as we used n-1 instead of n in order to get estimation of population standard deviation with a sample statistics. | + | * 따라서, |
| + | * $\displaystyle \text{Adjusted } R^{2} = 1 - \displaystyle \frac {\displaystyle \frac {SS_{res}}{n-p-1}}{\displaystyle | ||
| + | * This is **the same logic** as we used n-1 instead of n in order to get estimation of population standard deviation with a sample statistics. | ||
| * Therefore, the Adjusted r< | * Therefore, the Adjusted r< | ||
| + | **__왜 Adjusted R squared 값을 사용하는가? | ||
| + | * p가 커지면, 즉 . . . . | ||
| + | * Adjusted R squared 값이 작아지는 경향이 생긴다. | ||
| + | * 그런데, p가 커진다는 것은 독립변인을 자꾸 추가한다는 것인데, 독립변인 모든 X들이 사실은 Y를 설명하는 것이 아니라고 해도, (즉, X와 Y가 이론적인 원인과 결과의 관계를 갖지 않더라도) 자연적으로 R< | ||
| + | * < | ||
| + | * 가령 위의 경우, 연구자는 독립변인으로 처음 세가지만 사용할 것을 결정할 수 있는데 이는 Adjusted R 제곱값이 4번째 변인 투입부터 줄기때문이다. 반면에 R 제곱값은 계속 커진다. | ||
| - | If we take a look at the ANOVA result: | ||
| - | ^ __ANOVA__ | ||
| - | | Model | ||
| - | | 1 | ||
| - | | | Residual | ||
| - | | | Total | ||
| - | | a Predictors: (Constant), | ||
| - | | b Dependent Variable: y ||||||| | ||
| - | <WRAP clear /> | ||
| - | |||
| - | * ANOVA, F-test, $F=\frac{MS_{between}}{MS_{within}}$ | ||
| - | * MS_between? | ||
| - | * MS_within? | ||
| - | * MS for residual | ||
| - | * $s = \sqrt{s^2} = \sqrt{\frac{SS_{res}}{n-2}} $ | ||
| - | * random difference (MS< | ||
| - | * MS for regression . . . Obtained difference | ||
| - | * do the same procedure at the above in MS for residual. | ||
| - | * but, this time degress of freedom is k-1 (number of variables -1 ), 1. | ||
| - | * Then what does F value mean? | ||
| - | |||
| - | Then, we take another look at coefficients result: | ||
| - | |||
| - | ^ __example__ | ||
| - | | Model | ||
| - | | B | ||
| - | | 1 | ||
| - | | | ||
| - | | a Dependent Variable: y | ||
| - | <WRAP clear /> | ||
| - | |||
| - | * Why do we do t-test for the slope of X variable? The below is a mathematical explanation for this. | ||
| - | * Sampling distribution of Beta (혹은 b): | ||
| - | * $\sigma_{\beta_{1}} = \frac{\sigma}{\sqrt{SS_{xx}}}$ | ||
| - | * estimation of $\sigma_{\beta_{1}}$ : substitute sigma with s | ||
| - | * t-test | ||
| - | * $t=\frac{\beta_{1} - \text{Hypothesized value of }\beta_{1}}{s_{\beta_{1}}}$ | ||
| - | * Hypothesized value of beta 값은 대개 0. 따라서 t 값은 | ||
| - | * $t=\frac{\beta_{1}}{s_{\beta_{1}}}$ | ||
| - | * $s_{\beta} = \frac {MS_{E}}{SS_{X}} = \display\frac{\sqrt{\frac{SSE}{n-2}}}{\sqrt{SS_{X}}} = \display\frac{\sqrt{\frac{\Sigma{(Y-\hat{Y})^2}}{n-2}}}{\sqrt{\Sigma{(X_{i}-\bar{X})^2}}} $ | ||
| - | |||
| - | ^ X ^ Y ^ $X-\bar{X}$ | ||
| - | | 1 | 1 | -2 | 4 | 2 | 0.6 | -0.4 | 0.16 | | ||
| - | | 2 | 1 | -1 | 1 | 1 | 1.3 | 0.3 | 0.09 | | ||
| - | | 3 | 2 | 0 | 0 | 0 | 2 | 0 | 0 | | ||
| - | | 4 | 2 | 1 | 1 | 0 | 2.7 | 0.7 | 0.49 | | ||
| - | | 5 | 4 | 2 | 4 | 4 | 3.4 | -0.6 | 0.36 | | ||
| - | | $\bar{X}$ = 3 | 2 | | SS< | ||
| - | |||
| - | Regression formula: y< | ||
| - | SSE = Sum of Square Error | ||
| - | 기울기 beta(b)에 대한 표준오차값은 아래와 같이 구한다. | ||
| - | $$se_{\beta} = \frac {\sqrt{SSE/ | ||
| - | & = & \frac {\sqrt{1.1/ | ||
| - | 그리고 b = 0.7 | ||
| - | 따라서 t = b / se = 3.655631 | ||
adjusted_r_squared.1462920453.txt.gz · Last modified: by hkimscil
