Differences

This shows you the differences between two versions of the page.

--- sequential_regression [2020/12/01 14:11] – [Report] hkimscil
+++ sequential_regression [2024/06/12 08:30] (current) – [r] hkimscil
@@ Line 1: / Line 1: @@
+====== Sequential or Hierarchical regression ======
+연구자가 판단하여 독립변인들 중 필요한 것들을 묶어서 스테이지 별로 (단계 별) 넣고 분석하는 것을 말한다. Stepwise regression은 이를 컴퓨터나 계산방법을 통하여 수행하게 된다.
 ====== 데이터 ======
 ^  DATA for regression analysis   ^^^
@@ Line 44: / Line 46: @@
 The below is just an exercise for figuring out the unique part of r<sup>2</sup> value for x1 and x2 (수입, 가족수). For more information see part and zero-order relationship: see [[:multiple_regression#determining_ivs_role]] in multiple regression
-|  zero-order  ||  part  ||
+|  zero-order  ||  part = semi-partial  ||
 | x1  | x2  | x1p  | x2p  |
 | .794  | -.692  | .565  | -.409  |
 |  zero-order square  ||  part (in spss) = semipartial (in general)  ||
-| x1 sq (x1sq)  | x2 sq (x1sq)  | x1 part sq (x1psq) | x2 part sq (x1psq)  |
+| x1 zsq (x1zsq)  | x2 zsq (x1zsq)  | x1 semi-partial (or part) sq (x1spsq) | x2 part sq (x1spsq)  |
 | .630436  | .478864  | .319225  | .167281  |
 | a+b / a+b+c+d  | b+c / a+b+c+d  | a / a+b+c+d  | c / a+b+c+d  |
-x1sq - x1psq  ~= x2sq - x2psq
+x1zsq - x1spsq  ~= x2zsq - x2spsq
 .311211 ~= 0.311583
+아래는 r 에서 계산한 것
+<code>
+> .794^2 - .565^2
+[1] 0.3112
+> .692^2 - .409^2
+[1] 0.3116
+</code>
 R에서 보는 예는 아래를 참조
@@ Line 269: / Line 279: @@
 <code>
-pcor.test(datavar$bankaccount, datavar$income, datavar$famnum)
+pp.b.i <- pcor.test(datavar$bankaccount, datavar$income, datavar$famnum)
-pcor.test(datavar$bankaccount, datavar$famnum, datavar$income)
+p.b.i
+p.b.i$estimate
+p.b.f <- pcor.test(datavar$bankaccount, datavar$famnum, datavar$income)
+p.b.f
+p.b.f$estimate
+sp.b.i <- spcor.test(datavar$bankaccount, datavar$income, datavar$famnum)
+sp.b.i
+sp.b.i$estimate
+sp.b.f <- spcor.test(datavar$bankaccount, datavar$famnum, datavar$income)
+sp.b.f
+sp.b.f$estimate
+zc.b.i <- cor(datavar$bankaccount, datavar$income)
+zc.b.i
+zc.b.f <- cor(datavar$bankaccount, datavar$famnum)
+zc.b.f
+zc.b.i^2 - (sp.b.i$estimate)^2
+zc.b.f^2 - (sp.b.f$estimate)^2
-spcor.test(datavar$bankaccount, datavar$income, datavar$famnum)
-spcor.test(datavar$bankaccount, datavar$famnum, datavar$income)
 </code>
 . . .
 <code>
-> pcor.test(datavar$bankaccount, datavar$income, datavar$famnum)
+> pp.b.i <- pcor.test(datavar$bankaccount, datavar$income, datavar$famnum)
-   estimate    p.value statistic  n gp  Method
+> p.b.i
-0.7825112 0.01267595  3.325102 10  1 pearson
+  estimate p.value statistic  n gp  Method
-> pcor.test(datavar$bankaccount, datavar$famnum, datavar$income)
+   0.7825 0.01268     3.325 10  1 pearson
-   estimate    p.value statistic  n gp  Method
+> p.b.i$estimate
- -0.672856 0.04702022 -2.406425 10  1 pearson
+[1] 0.7825
->
-> spcor.test(datavar$bankaccount, datavar$income, datavar$famnum)
-   estimate  p.value statistic  n gp  Method
-0.5646726 0.113182  1.810198 10  1 pearson
-> spcor.test(datavar$bankaccount, datavar$famnum, datavar$income)
-    estimate   p.value statistic  n gp  Method
- -0.4086619 0.2748117 -1.184655 10  1 pearson
 >
+> p.b.f <- pcor.test(datavar$bankaccount, datavar$famnum, datavar$income)
+> p.b.f
+  estimate p.value statistic  n gp  Method
+  -0.6729 0.04702    -2.406 10  1 pearson
+> p.b.f$estimate
+[1] -0.6729
+>
+> sp.b.i <- spcor.test(datavar$bankaccount, datavar$income, datavar$famnum)
+> sp.b.i
+  estimate p.value statistic  n gp  Method
+   0.5647  0.1132      1.81 10  1 pearson
+> sp.b.i$estimate
+[1] 0.5647
+> sp.b.f <- spcor.test(datavar$bankaccount, datavar$famnum, datavar$income)
+> sp.b.f
+  estimate p.value statistic  n gp  Method
+  -0.4087  0.2748    -1.185 10  1 pearson
+> sp.b.f$estimate
+[1] -0.4087
+>
+>
+> zc.b.i <- cor(datavar$bankaccount, datavar$income)
+> zc.b.i
+[1] 0.7944
+> zc.b.f <- cor(datavar$bankaccount, datavar$famnum)
+> zc.b.f
+[1] -0.6923
+>
+> zc.b.i^2 - (sp.b.i$estimate)^2
+[1] 0.3123
+> zc.b.f^2 - (sp.b.f$estimate)^2
+[1] 0.3123
+>
+>
+>
+</code>
+====== e.g. 3. College enrollment in New Mexico University ======
+<code>
+> datavar <- read.csv("http://commres.net/wiki/_media/r/dataset_hlr.csv")
+> str(datavar)
+'data.frame':	29 obs. of  5 variables:
+ $ YEAR : int  1 2 3 4 5 6 7 8 9 10 ...
+ $ ROLL : int  5501 5945 6629 7556 8716 9369 9920 10167 11084 12504 ...
+ $ UNEM : num  8.1 7 7.3 7.5 7 6.4 6.5 6.4 6.3 7.7 ...
+ $ HGRAD: int  9552 9680 9731 11666 14675 15265 15484 15723 16501 16890 ...
+ $ INC  : int  1923 1961 1979 2030 2112 2192 2235 2351 2411 2475 ...
+>
+</code>
+<code>
+onePredictorModel <- lm(ROLL ~ UNEM, data = datavar)
+twoPredictorModel <- lm(ROLL ~ UNEM + HGRAD, data = datavar)
+threePredictorModel <- lm(ROLL ~ UNEM + HGRAD + INC, data = datavar)
+</code>
+<code>summary(onePredictorModel)
+summary(twoPredictorModel)
+summary(threePredictorModel)
+</code>
+<code>> summary(onePredictorModel)
+Call:
+lm(formula = ROLL ~ UNEM, data = datavar)
+Residuals:
+    Min      1Q  Median      3Q     Max
+-7640.0 -1046.5   602.8  1934.3  4187.2
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)   3957.0     4000.1   0.989   0.3313
+UNEM          1133.8      513.1   2.210   0.0358 *
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 3049 on 27 degrees of freedom
+Multiple R-squared:  0.1531,	Adjusted R-squared:  0.1218
+F-statistic: 4.883 on 1 and 27 DF,  p-value: 0.03579
+</code>
+<code>> summary(twoPredictorModel)
+Call:
+lm(formula = ROLL ~ UNEM + HGRAD, data = datavar)
+Residuals:
+    Min      1Q  Median      3Q     Max
+-2102.2  -861.6  -349.4   374.5  3603.5
+Coefficients:
+              Estimate Std. Error t value Pr(>|t|)
+(Intercept) -8.256e+03  2.052e+03  -4.023  0.00044 ***
+UNEM         6.983e+02  2.244e+02   3.111  0.00449 **
+HGRAD        9.423e-01  8.613e-02  10.941 3.16e-11 ***
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 1313 on 26 degrees of freedom
+Multiple R-squared:  0.8489,	Adjusted R-squared:  0.8373
+F-statistic: 73.03 on 2 and 26 DF,  p-value: 2.144e-11
+> </code>
+<code>
+> summary(threePredictorModel)
+Call:
+lm(formula = ROLL ~ UNEM + HGRAD + INC, data = datavar)
+Residuals:
+     Min       1Q   Median       3Q      Max
+-1148.84  -489.71    -1.88   387.40  1425.75
+Coefficients:
+              Estimate Std. Error t value Pr(>|t|)
+(Intercept) -9.153e+03  1.053e+03  -8.691 5.02e-09 ***
+UNEM         4.501e+02  1.182e+02   3.809 0.000807 ***
+HGRAD        4.065e-01  7.602e-02   5.347 1.52e-05 ***
+INC          4.275e+00  4.947e-01   8.642 5.59e-09 ***
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 670.4 on 25 degrees of freedom
+Multiple R-squared:  0.9621,	Adjusted R-squared:  0.9576
+F-statistic: 211.5 on 3 and 25 DF,  p-value: < 2.2e-16
+</code>
+<code>anova(onePredictorModel, twoPredictorModel, threePredictorModel)
+Analysis of Variance Table
+Model 1: ROLL ~ UNEM
+Model 2: ROLL ~ UNEM + HGRAD
+Model 3: ROLL ~ UNEM + HGRAD + INC
+  Res.Df       RSS Df Sum of Sq      F    Pr(>F)
+     27 251084710
+     26  44805568  1 206279143 458.92 < 2.2e-16 ***
+     25  11237313  1  33568255  74.68 5.594e-09 ***
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 >
 </code>
-====== e.g. 3. Happiness  ======
+====== e.g. 4. Happiness  ======
 {{:hierarchical.regression.data.csv}}
@@ Line 472: / Line 634: @@
 >
 </code>
-===== Report =====
+Report in research paper
 {{:pasted:20201201-140842.png}}
 {{:pasted:20201201-141106.png}}
+====== e.g. 5: Stock Market ======
+see [[:r:multiple_regression#partial_semi-partial_correlation_and_r_squared_value|Partial and semipartial example in r]]
+====== e.g. 6: SWISS ======