sequential_regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
sequential_regression [2020/11/26 08:35] – [r] hkimscilsequential_regression [2024/06/12 08:30] (current) – [r] hkimscil
Line 1: Line 1:
 +====== Sequential or Hierarchical regression ======
 +연구자가 판단하여 독립변인들 중 필요한 것들을 묶어서 스테이지 별로 (단계 별) 넣고 분석하는 것을 말한다. Stepwise regression은 이를 컴퓨터나 계산방법을 통하여 수행하게 된다.  
 ====== 데이터 ====== ====== 데이터 ======
 ^  DATA for regression analysis   ^^^ ^  DATA for regression analysis   ^^^
Line 44: Line 46:
 The below is just an exercise for figuring out the unique part of r<sup>2</sup> value for x1 and x2 (수입, 가족수). For more information see part and zero-order relationship: see [[:multiple_regression#determining_ivs_role]] in multiple regression The below is just an exercise for figuring out the unique part of r<sup>2</sup> value for x1 and x2 (수입, 가족수). For more information see part and zero-order relationship: see [[:multiple_regression#determining_ivs_role]] in multiple regression
  
-|  zero-order  ||  part  ||+|  zero-order  ||  part = semi-partial  ||
 | x1  | x2  | x1p  | x2p  | | x1  | x2  | x1p  | x2p  |
 | .794  | -.692  | .565  | -.409  |  | .794  | -.692  | .565  | -.409  | 
 |  zero-order square  ||  part (in spss) = semipartial (in general)  || |  zero-order square  ||  part (in spss) = semipartial (in general)  ||
-| x1 sq (x1sq)  | x2 sq (x1sq)  | x1 part sq (x1psq) | x2 part sq (x1psq)  |+| x1 zsq (x1zsq)  | x2 zsq (x1zsq)  | x1 semi-partial (or partsq (x1spsq) | x2 part sq (x1spsq)  |
 | .630436  | .478864  | .319225  | .167281  | | .630436  | .478864  | .319225  | .167281  |
 | a+b / a+b+c+d  | b+c / a+b+c+d  | a / a+b+c+d  | c / a+b+c+d  | | a+b / a+b+c+d  | b+c / a+b+c+d  | a / a+b+c+d  | c / a+b+c+d  |
  
  
-x1sq x1psq  ~= x2sq x2psq+x1zsq x1spsq  ~= x2zsq x2spsq
 0.311211 ~= 0.311583 0.311211 ~= 0.311583
 +
 +아래는 r 에서 계산한 것
 +<code>
 +> .794^2 - .565^2
 +[1] 0.3112
 +> .692^2 - .409^2
 +[1] 0.3116
 +</code>
  
 R에서 보는 예는 아래를 참조 R에서 보는 예는 아래를 참조
Line 230: Line 240:
 | famnum  | -0.6922935  | -0.3999614   | 1.0000000    |  | famnum  | -0.6922935  | -0.3999614   | 1.0000000    | 
  
 +<code>
 +sp.b.i <- 0.5646726 ## (1)
 +c.b.i <- 0.7944312 ## (3)
  
-sp.b.i <- 0.5646726 +sp.b.f <- -0.4086619 ## (2) 
-c.b.i <- 0.7944312 +c.b.f <- -0.6922935 ## (4)
- +
-sp.b.f <- -0.4086619 +
-c.b.f <- -0.6922935+
  
 c.b.i.sq <- c.b.i^2 ## (3)^2 c.b.i.sq <- c.b.i^2 ## (3)^2
-sp.b.i.sq <- sp.b.i^2 +sp.b.i.sq <- sp.b.i^2 ## (1)^2
 c.b.i.sq - sp.b.i.sq  c.b.i.sq - sp.b.i.sq 
  
 c.b.f.sq <- c.b.f^2 ## (4)^2 c.b.f.sq <- c.b.f^2 ## (4)^2
-sp.b.f.sq <- sp.b.f^2 +sp.b.f.sq <- sp.b.f^2 ## (1)^2
 c.b.f.sq - sp.b.f.sq c.b.f.sq - sp.b.f.sq
 +</code>
  
 <code> <code>
-pcor.test(datavar$bankaccount, datavar$income, datavar$famnum) +> sp.b.i <- 0.5646726 
-pcor.test(datavar$bankaccount, datavar$famnum, datavar$income)+> c.b.i <- 0.7944312 
 +>  
 +> sp.b.f <- -0.4086619 
 +> c.b.f <- -0.6922935 
 +>  
 +> c.b.i.sq <- c.b.i^2 ## (3)^2 
 +> sp.b.i.sq <- sp.b.i^2 
 +>  
 +> c.b.i.sq - sp.b.i.sq  
 +[1] 0.3123 
 +>  
 +> c.b.f.sq <- c.b.f^2 ## (4)^2 
 +> sp.b.f.sq <- sp.b.f^2 
 +>  
 +> c.b.f.sq - sp.b.f.sq 
 +[1] 0.3123 
 +</code> 
 + 
 +0.3123 가 두 독립변인이 DV에 같이 (공히) 미치는 영향력 분량이다.   
 + 
 +<code> 
 +pp.b.i <- pcor.test(datavar$bankaccount, datavar$income, datavar$famnum) 
 +p.b.i 
 +p.b.i$estimate 
 + 
 +p.b.f <- pcor.test(datavar$bankaccount, datavar$famnum, datavar$income) 
 +p.b.f 
 +p.b.f$estimate 
 + 
 +sp.b.i <- spcor.test(datavar$bankaccount, datavar$income, datavar$famnum) 
 +sp.b.i 
 +sp.b.i$estimate 
 +sp.b.f <- spcor.test(datavar$bankaccount, datavar$famnum, datavar$income) 
 +sp.b.f 
 +sp.b.f$estimate 
 + 
 + 
 +zc.b.i <- cor(datavar$bankaccount, datavar$income) 
 +zc.b.i  
 +zc.b.f <- cor(datavar$bankaccount, datavar$famnum) 
 +zc.b.f 
 + 
 +zc.b.i^2 - (sp.b.i$estimate)^2 
 +zc.b.f^2 - (sp.b.f$estimate)^2
  
-spcor.test(datavar$bankaccount, datavar$income, datavar$famnum) 
-spcor.test(datavar$bankaccount, datavar$famnum, datavar$income) 
 </code> </code>
 . . .  . . . 
 <code> <code>
-> pcor.test(datavar$bankaccount, datavar$income, datavar$famnum) +pp.b.i <- pcor.test(datavar$bankaccount, datavar$income, datavar$famnum) 
-   estimate    p.value statistic  n gp  Method +> p.b.i 
-1 0.7825112 0.01267595  3.325102 10  1 pearson +  estimate p.value statistic  n gp  Method 
-> pcor.test(datavar$bankaccount, datavar$famnum, datavar$income) +  0.7825 0.01268     3.325 10  1 pearson 
-   estimate    p.value statistic  n gp  Method +p.b.i$estimate 
-1 -0.672856 0.04702022 -2.406425 10  1 pearson +[1] 0.7825 
-+>  
-> spcor.test(datavar$bankaccount, datavar$income, datavar$famnum) +> p.b.f <- pcor.test(datavar$bankaccount, datavar$famnum, datavar$income) 
-   estimate  p.value statistic  n gp  Method +> p.b.f 
-1 0.5646726 0.113182  1.810198 10  1 pearson +  estimate p.value statistic  n gp  Method 
-> spcor.test(datavar$bankaccount, datavar$famnum, datavar$income) + -0.6729 0.04702    -2.406 10  1 pearson 
-    estimate   p.value statistic  n gp  Method +p.b.f$estimate 
-1 -0.4086619 0.2748117 -1.184655 10  1 pearson+[1] -0.6729 
 + 
 +> sp.b.i <- spcor.test(datavar$bankaccount, datavar$income, datavar$famnum) 
 +> sp.b.i 
 +  estimate p.value statistic  n gp  Method 
 +  0.5647  0.1132      1.81 10  1 pearson 
 +sp.b.i$estimate 
 +[1] 0.5647 
 +> sp.b.f <- spcor.test(datavar$bankaccount, datavar$famnum, datavar$income) 
 +> sp.b.f 
 +  estimate p.value statistic  n gp  Method 
 + -0.4087  0.2748    -1.185 10  1 pearson 
 +> sp.b.f$estimate 
 +[1] -0.4087 
 +>  
 +>  
 +> zc.b.i <- cor(datavar$bankaccount, datavar$income) 
 +> zc.b.i  
 +[1] 0.7944 
 +> zc.b.f <- cor(datavar$bankaccount, datavar$famnum) 
 +> zc.b.f 
 +[1] -0.6923 
 +>  
 +> zc.b.i^2 - (sp.b.i$estimate)^2 
 +[1] 0.3123 
 +> zc.b.f^2 - (sp.b.f$estimate)^2 
 +[1] 0.3123 
 +
  
  
-  
 </code> </code>
 +
 +====== e.g. 3. College enrollment in New Mexico University ======
 +<code>
 +> datavar <- read.csv("http://commres.net/wiki/_media/r/dataset_hlr.csv")
 +> str(datavar)
 +'data.frame': 29 obs. of  5 variables:
 + $ YEAR : int  1 2 3 4 5 6 7 8 9 10 ...
 + $ ROLL : int  5501 5945 6629 7556 8716 9369 9920 10167 11084 12504 ...
 + $ UNEM : num  8.1 7 7.3 7.5 7 6.4 6.5 6.4 6.3 7.7 ...
 + $ HGRAD: int  9552 9680 9731 11666 14675 15265 15484 15723 16501 16890 ...
 + $ INC  : int  1923 1961 1979 2030 2112 2192 2235 2351 2411 2475 ...
 +
 +</code>
 +
 +<code>
 +onePredictorModel <- lm(ROLL ~ UNEM, data = datavar)
 +twoPredictorModel <- lm(ROLL ~ UNEM + HGRAD, data = datavar)
 +threePredictorModel <- lm(ROLL ~ UNEM + HGRAD + INC, data = datavar)
 +</code>
 +
 +<code>summary(onePredictorModel)
 +summary(twoPredictorModel)
 +summary(threePredictorModel)
 +</code>
 +
 +<code>> summary(onePredictorModel)
 +
 +Call:
 +lm(formula = ROLL ~ UNEM, data = datavar)
 +
 +Residuals:
 +    Min      1Q  Median      3Q     Max 
 +-7640.0 -1046.5   602.8  1934.3  4187.2 
 +
 +Coefficients:
 +            Estimate Std. Error t value Pr(>|t|)  
 +(Intercept)   3957.0     4000.1   0.989   0.3313  
 +UNEM          1133.8      513.1   2.210   0.0358 *
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 3049 on 27 degrees of freedom
 +Multiple R-squared:  0.1531, Adjusted R-squared:  0.1218 
 +F-statistic: 4.883 on 1 and 27 DF,  p-value: 0.03579
 +</code>
 +
 +<code>> summary(twoPredictorModel)
 +
 +Call:
 +lm(formula = ROLL ~ UNEM + HGRAD, data = datavar)
 +
 +Residuals:
 +    Min      1Q  Median      3Q     Max 
 +-2102.2  -861.6  -349.4   374.5  3603.5 
 +
 +Coefficients:
 +              Estimate Std. Error t value Pr(>|t|)    
 +(Intercept) -8.256e+03  2.052e+03  -4.023  0.00044 ***
 +UNEM         6.983e+02  2.244e+02   3.111  0.00449 ** 
 +HGRAD        9.423e-01  8.613e-02  10.941 3.16e-11 ***
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 1313 on 26 degrees of freedom
 +Multiple R-squared:  0.8489, Adjusted R-squared:  0.8373 
 +F-statistic: 73.03 on 2 and 26 DF,  p-value: 2.144e-11
 +
 +> </code>
 +<code>
 +> summary(threePredictorModel)
 +
 +Call:
 +lm(formula = ROLL ~ UNEM + HGRAD + INC, data = datavar)
 +
 +Residuals:
 +     Min       1Q   Median       3Q      Max 
 +-1148.84  -489.71    -1.88   387.40  1425.75 
 +
 +Coefficients:
 +              Estimate Std. Error t value Pr(>|t|)    
 +(Intercept) -9.153e+03  1.053e+03  -8.691 5.02e-09 ***
 +UNEM         4.501e+02  1.182e+02   3.809 0.000807 ***
 +HGRAD        4.065e-01  7.602e-02   5.347 1.52e-05 ***
 +INC          4.275e+00  4.947e-01   8.642 5.59e-09 ***
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 670.4 on 25 degrees of freedom
 +Multiple R-squared:  0.9621, Adjusted R-squared:  0.9576 
 +F-statistic: 211.5 on 3 and 25 DF,  p-value: < 2.2e-16
 +
 +</code>
 +
 +<code>anova(onePredictorModel, twoPredictorModel, threePredictorModel)
 +Analysis of Variance Table
 +
 +Model 1: ROLL ~ UNEM
 +Model 2: ROLL ~ UNEM + HGRAD
 +Model 3: ROLL ~ UNEM + HGRAD + INC
 +  Res.Df       RSS Df Sum of Sq      F    Pr(>F)    
 +1     27 251084710                                  
 +2     26  44805568  1 206279143 458.92 < 2.2e-16 ***
 +3     25  11237313  1  33568255  74.68 5.594e-09 ***
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +</code>
 +
 +====== e.g. 4. Happiness  ======
 +{{:hierarchical.regression.data.csv}}
 +
 +<code>
 +# Import data (simulated data for this example)
 +# myData <- read.csv('http://static.lib.virginia.edu/statlab/materials/data/hierarchicalRegressionData.csv')  
 +myData <- read.csv("http://commres.net/wiki/_media/hierarchical.regression.data.csv")
 +</code>
 +
 +<code>
 +> str(myData)
 +'data.frame': 100 obs. of  5 variables:
 + $ happiness: int  5 5 6 4 3 5 5 5 4 4 ...
 + $ age      : int  24 28 25 26 20 25 24 24 26 26 ...
 + $ gender   : chr  "Male" "Male" "Female" "Male" ...
 + $ friends  : int  12 8 6 4 8 9 5 6 8 6 ...
 + $ pets     : int  3 1 0 2 0 0 5 2 1 4 ...
 +> myData$gender <- factor(myData$gender)
 +> str(myData)
 +'data.frame': 100 obs. of  5 variables:
 + $ happiness: int  5 5 6 4 3 5 5 5 4 4 ...
 + $ age      : int  24 28 25 26 20 25 24 24 26 26 ...
 + $ gender   : Factor w/ 2 levels "Female","Male": 2 2 1 2 1 2 2 2 2 2 ...
 + $ friends  : int  12 8 6 4 8 9 5 6 8 6 ...
 + $ pets     : int  3 1 0 2 0 0 5 2 1 4 ...
 +
 +</code>
 +
 +<code>
 +> m0 <- lm(happiness ~ 1, data = myData)
 +> anova(m0)
 +Analysis of Variance Table
 +
 +Response: happiness
 +          Df Sum Sq Mean Sq F value Pr(>F)
 +Residuals 99 240.84  2.4327               
 +
 +</code>
 +
 +<code>
 +# 불필요하지만 위의 분석이 variance와 
 +# 같은 것이라는 것을 아래처럼 확인한다.
 +> attach(myData)
 +The following objects are masked from myData (pos = 3):
 +
 +    age, friends, gender, happiness, pets
 +
 +> var(happiness)
 +[1] 2.432727
 +> length(happiness)
 +[1] 100
 +> df.happiness <- length(happiness) - 1
 +> df.happiness # degrees of freedom 
 +[1] 99
 +> ss.happiness <- var(happiness)* df.happiness # sum of square (ss) value for happiness variable
 +> ss.happiness 
 +[1] 240.84
 +
 +</code>
 +
 +<code>
 +> m1 <- lm(happiness ~ age + gender, data=myData)  # Model 1
 +> summary(m1)
 +
 +Call:
 +lm(formula = happiness ~ age + gender, data = myData)
 +
 +Residuals:
 +    Min      1Q  Median      3Q     Max 
 +-3.6688 -1.0094 -0.1472  0.8273  4.2973 
 +
 +Coefficients:
 +            Estimate Std. Error t value Pr(>|t|)    
 +(Intercept)  7.66778    2.01364   3.808 0.000246 ***
 +age         -0.13039    0.07936  -1.643 0.103611    
 +genderMale   0.16430    0.31938   0.514 0.608106    
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 1.553 on 97 degrees of freedom
 +Multiple R-squared:  0.02855, Adjusted R-squared:  0.008515 
 +F-statistic: 1.425 on 2 and 97 DF,  p-value: 0.2455
 +
 +
 +</code>
 +
 +<code>
 +# m1은 이미 위에서 실행
 +> m2 <- lm(happiness ~ age + gender + friends, data=myData)  # Model 2: Adding friends variable 
 +> m3 <- lm(happiness ~ age + gender + friends + pets, data = myData) # Model 3: Adding pets variable
 +</code>
 +
 +<code>
 +> anova(m1, m2, m3)
 +Analysis of Variance Table
 +
 +Model 1: happiness ~ age + gender
 +Model 2: happiness ~ age + gender + friends
 +Model 3: happiness ~ age + gender + friends + pets
 +  Res.Df    RSS Df Sum of Sq          Pr(>F)    
 +1     97 233.97                                   
 +2     96 209.27  1    24.696 12.1293 0.0007521 ***
 +3     95 193.42  1    15.846  7.7828 0.0063739 ** 
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +</code>
 +
 +<code>
 +> summary(m1)
 +
 +Call:
 +lm(formula = happiness ~ age + gender, data = myData)
 +
 +Residuals:
 +    Min      1Q  Median      3Q     Max 
 +-3.6688 -1.0094 -0.1472  0.8273  4.2973 
 +
 +Coefficients:
 +            Estimate Std. Error t value Pr(>|t|)    
 +(Intercept)  7.66778    2.01364   3.808 0.000246 ***
 +age         -0.13039    0.07936  -1.643 0.103611    
 +genderMale   0.16430    0.31938   0.514 0.608106    
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 1.553 on 97 degrees of freedom
 +Multiple R-squared:  0.02855, Adjusted R-squared:  0.008515 
 +F-statistic: 1.425 on 2 and 97 DF,  p-value: 0.2455
 +
 +> summary(m2)
 +
 +Call:
 +lm(formula = happiness ~ age + gender + friends, data = myData)
 +
 +Residuals:
 +    Min      1Q  Median      3Q     Max 
 +-3.5758 -1.0204  0.0156  0.8087  3.7299 
 +
 +Coefficients:
 +            Estimate Std. Error t value Pr(>|t|)   
 +(Intercept)  6.21730    1.96220   3.169  0.00206 **
 +age         -0.12479    0.07546  -1.654  0.10146   
 +genderMale   0.14931    0.30365   0.492  0.62405   
 +friends      0.18985    0.05640   3.366  0.00110 **
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 1.476 on 96 degrees of freedom
 +Multiple R-squared:  0.1311, Adjusted R-squared:  0.1039 
 +F-statistic: 4.828 on 3 and 96 DF,  p-value: 0.003573
 +
 +> summary(m3)
 +
 +Call:
 +lm(formula = happiness ~ age + gender + friends + pets, data = myData)
 +
 +Residuals:
 +    Min      1Q  Median      3Q     Max 
 +-3.0556 -1.0183 -0.1109  0.8832  3.5911 
 +
 +Coefficients:
 +            Estimate Std. Error t value Pr(>|t|)   
 +(Intercept)  5.78540    1.90266   3.041  0.00305 **
 +age         -0.11146    0.07309  -1.525  0.13057   
 +genderMale  -0.14267    0.31157  -0.458  0.64806   
 +friends      0.17134    0.05491   3.120  0.00239 **
 +pets         0.36391    0.13044   2.790  0.00637 **
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 1.427 on 95 degrees of freedom
 +Multiple R-squared:  0.1969, Adjusted R-squared:  0.1631 
 +F-statistic: 5.822 on 4 and 95 DF,  p-value: 0.0003105
 +
 +
 +</code>
 +
 +Report in research paper
 +{{:pasted:20201201-140842.png}}
 +{{:pasted:20201201-141106.png}}
 +
 +====== e.g. 5: Stock Market ======
 +see [[:r:multiple_regression#partial_semi-partial_correlation_and_r_squared_value|Partial and semipartial example in r]]
 +
 +====== e.g. 6: SWISS ======
 +
 +
sequential_regression.1606347312.txt.gz · Last modified: 2020/11/26 08:35 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki