regression
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| regression [2023/05/17 07:58] – hkimscil | regression [2024/09/30 01:13] (current) – [E.g., Simple regression] hkimscil | ||
|---|---|---|---|
| Line 23: | Line 23: | ||
| 상관관계에서 살펴 본것처럼, | 상관관계에서 살펴 본것처럼, | ||
| - | + | \begin{eqnarray*} | |
| - | $$b = \displaystyle \frac{SP}{SS_X}$$ | + | b & = & \displaystyle \frac{SP}{SS_X} |
| - | $$a = \displaystyle \overline{Y} - b \overline{X} | + | a & = & \displaystyle \overline{Y} - b \overline{X} |
| + | \end{eqnarray*} | ||
| + | 참조: [[deriviation of a and b in a simple regression|리그레션에서 a와 b 구하기]] | ||
| [{{: | [{{: | ||
| Line 283: | Line 285: | ||
| ^ ANOVA(b) | ^ ANOVA(b) | ||
| | Model | | Model | ||
| - | | 1.000 | Regression | + | | 1.000 | Regression |
| - | | | + | | |
| | | | | ||
| | a Predictors: (Constant), bankIncome | | a Predictors: (Constant), bankIncome | ||
| Line 301: | Line 303: | ||
| 위의 표에서 (Anova table), | 위의 표에서 (Anova table), | ||
| - | | for SS | for degrees of freedom | + | | @grey: |
| - | | @white: white \\ = explained error (E) \\ = $SS{reg}$ | + | | @white: white \\ = explained error (E) \\ = $SS{reg}$ |
| - | | @orange: orange \\ = unexplained error (U) \\ = $SS{res}$ | + | | @orange: orange \\ = unexplained error (U) \\ = $SS{res}$ |
| | @yellow: yellow \\ = total error $SS_{total}$ \\ = E + U \\ = $SS_{reg} + SS_{res}$ | @#eee: grey \\ = total df \\ = total sample # -1 | | | @yellow: yellow \\ = total error $SS_{total}$ \\ = E + U \\ = $SS_{reg} + SS_{res}$ | @#eee: grey \\ = total df \\ = total sample # -1 | | ||
| Line 350: | Line 352: | ||
| {{: | {{: | ||
| {{: | {{: | ||
| + | |||
| + | <file csv acidity.csv> | ||
| + | stream spec83 ph83 | ||
| + | Moss 6 6.30 | ||
| + | Orcutt 9 6.30 | ||
| + | Ellinwood 6 6.30 | ||
| + | Jacks 3 6.20 | ||
| + | Riceville 5 6.20 | ||
| + | Lyons 3 6.10 | ||
| + | Osgood 5 5.80 | ||
| + | Whetstone 4 5.70 | ||
| + | UpperKeyup 1 5.70 | ||
| + | West 7 5.70 | ||
| + | Boyce 4 5.60 | ||
| + | MormonHollow 4 5.50 | ||
| + | Lawrence 5 5.40 | ||
| + | Wilder 0 4.70 | ||
| + | Templeton 0 4.50 | ||
| + | </ | ||
| + | < | ||
| + | df <- read.csv(" | ||
| + | </ | ||
| < | < | ||
| Line 663: | Line 687: | ||
| - | **__r-square:__** | + | ===== r-square |
| * $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = \frac{\text{Explained sample variability}}{\text{Total sample variability}}$ | * $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = \frac{\text{Explained sample variability}}{\text{Total sample variability}}$ | ||
| Line 672: | Line 696: | ||
| - | **__Adjusted | + | ===== Adjusted |
| * $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = 1 - \frac{SS_{res}}{SS_{total}} $ , | * $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = 1 - \frac{SS_{res}}{SS_{total}} $ , | ||
| Line 693: | Line 717: | ||
| * R2 value goes down -- which means | * R2 value goes down -- which means | ||
| * more (many) IVs is not always good | * more (many) IVs is not always good | ||
| - | * Therefore, the Adjusted r< | + | * Therefore, the Adjusted r< |
| - | **__Slope test__** | + | ===== Slope test ===== |
| If we take a look at the ANOVA result: | If we take a look at the ANOVA result: | ||
| Line 706: | Line 730: | ||
| | b Dependent Variable: y ||||||| | | b Dependent Variable: y ||||||| | ||
| <WRAP clear /> | <WRAP clear /> | ||
| + | F test recap. | ||
| * ANOVA, F-test, $F=\frac{MS_{between}}{MS_{within}}$ | * ANOVA, F-test, $F=\frac{MS_{between}}{MS_{within}}$ | ||
| - | | + | |
| - | * MS_within? | + | * MS_within? |
| - | * MS for residual | + | * regression에서 within 에 해당하는 것 == residual |
| * $s = \sqrt{s^2} = \sqrt{\frac{SS_{res}}{n-2}} $ | * $s = \sqrt{s^2} = \sqrt{\frac{SS_{res}}{n-2}} $ | ||
| - | * random difference (MS< | + | |
| * MS for regression . . . Obtained difference | * MS for regression . . . Obtained difference | ||
| * do the same procedure at the above in MS for < | * do the same procedure at the above in MS for < | ||
| Line 729: | Line 753: | ||
| * Why do we do t-test for the slope of X variable? The below is a mathematical explanation for this. | * Why do we do t-test for the slope of X variable? The below is a mathematical explanation for this. | ||
| - | * Sampling distribution of b: | + | * Sampling distribution of error around the slope line b: |
| * $\displaystyle \sigma_{b_{1}} = \frac{\sigma}{\sqrt{SS_{x}}}$ | * $\displaystyle \sigma_{b_{1}} = \frac{\sigma}{\sqrt{SS_{x}}}$ | ||
| + | * We remember that $\displaystyle \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$ ? | ||
| * estimation of $\sigma_{b_{1}}$ : substitute sigma with s | * estimation of $\sigma_{b_{1}}$ : substitute sigma with s | ||
| + | 만약에 error들이 (residual들) slope b를 중심으로 포진해 있고, 이것을 따로 떼어내서 distribution curve를 그려보면 평균이 0이고 standard deviation이 위의 standard error값을 갖는 normal distribution을 이루게 될 것이다. | ||
| * t-test | * t-test | ||
| - | |||
| * $\displaystyle t=\frac{b_{1} - \text{Hypothesized value of }\beta_{1}}{s_{b_{1}}}$ | * $\displaystyle t=\frac{b_{1} - \text{Hypothesized value of }\beta_{1}}{s_{b_{1}}}$ | ||
| + | * Hypothesized value of b 값은 (혹은 beta) 0. 따라서 t 값은 | ||
| + | * $\displaystyle t=\frac{b_{1}}{s_{b_{1}}}$ | ||
| + | * 기울기에 대한 표준오차는 (se) 아래와 같이 구한다 | ||
| - | * Hypothesized value of beta 값은 대개 0. 따라서 t 값은 | + | \begin{eqnarray*} |
| - | + | \displaystyle | |
| - | * $\displaystyle | + | & = & \displaystyle \sqrt { \frac{1}{n-2} * \frac{SSE}{SS_{X}}} \\ |
| + | & = & \displaystyle \sqrt { \frac{1}{n-2} * \frac{ \Sigma{(Y-\hat{Y})^2} }{ \Sigma{ (X_{i} - \bar{X})^2 } } } \\ | ||
| + | \end{eqnarray*} | ||
| - | * $\displaystyle s_{b_{1}} = \sqrt {\frac {MSE}{SS_{X}}} = \frac{\sqrt{\frac{SSE}{n-2}}}{\sqrt{SS_{X}}} = \displaystyle \frac{\sqrt{\frac{\Sigma{(Y-\hat{Y})^2}}{n-2}}}{\sqrt{\Sigma{(X_{i}-\bar{X})^2}}} $ | ||
| ^ X ^ Y ^ $X-\bar{X}$ | ^ X ^ Y ^ $X-\bar{X}$ | ||
| Line 754: | Line 781: | ||
| Regression formula: y< | Regression formula: y< | ||
| - | SSE = Sum of Square Error | + | SSE = Sum of Square Error = SS_residual |
| 기울기 beta(b)에 대한 표준오차값은 아래와 같이 구한다. | 기울기 beta(b)에 대한 표준오차값은 아래와 같이 구한다. | ||
| + | |||
| \begin{eqnarray*} | \begin{eqnarray*} | ||
| se_{\beta} & = & \frac {\sqrt{SSE/ | se_{\beta} & = & \frac {\sqrt{SSE/ | ||
| Line 764: | Line 792: | ||
| 따라서 t = b / se = 3.655631 | 따라서 t = b / se = 3.655631 | ||
| - | < | ||
| - | y <- c(1, 1, 2, 2, 4) | ||
| - | mody <- lm(y ~ x) | ||
| - | </ | ||
| - | < | ||
| - | > x <- c(1, 2, 3, 4, 5) | ||
| - | > y <- c(1, 1, 2, 2, 4) | ||
| - | > mody <- lm(y ~ x) | ||
| - | > summary(mody) | ||
| - | |||
| - | Call: | ||
| - | lm(formula = y ~ x) | ||
| - | |||
| - | Residuals: | ||
| - | | ||
| - | | ||
| - | |||
| - | Coefficients: | ||
| - | Estimate Std. Error t value Pr(> | ||
| - | (Intercept) | ||
| - | x | ||
| - | --- | ||
| - | Signif. codes: | ||
| - | |||
| - | Residual standard error: 0.6055 on 3 degrees of freedom | ||
| - | Multiple R-squared: | ||
| - | F-statistic: | ||
| - | |||
| - | > | ||
| - | </ | ||
| ====== E.g., 4. Simple regression ====== | ====== E.g., 4. Simple regression ====== | ||
| Another example of simple regression: from {{: | Another example of simple regression: from {{: | ||
regression.1684277883.txt.gz · Last modified: by hkimscil
