multicolinearity

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
multicolinearity [2018/12/26 02:21] hkimscilmulticolinearity [2018/12/26 02:49] (current) – [regression test with factors] hkimscil
Line 1: Line 1:
-====== Multi-colinearity ======+====== Multi-colinearity check in r ====== 
 +required library:  
 +  * corrplot 
 +  * mctest 
 +    * omcdiag 
 +    * imcdiag 
 + 
 <code> <code>
 > cps <- read.csv("http://commres.net/wiki/_media/cps_85_wages.csv", header = T, sep = "\t") > cps <- read.csv("http://commres.net/wiki/_media/cps_85_wages.csv", header = T, sep = "\t")
Line 70: Line 77:
 > library(corrplot) > library(corrplot)
 > cps.cor = cor(cps) > cps.cor = cor(cps)
-> corrplot.mixed(cps.cor, lower.col = "black", number.cex = .7)+> corrplot.mixed(cps.cor, lower.col = "black")
 </code> </code>
-{{cps.corplot.png}}+{{cps.corrplot.png?500}}
  
 <code> <code>
Line 178: Line 185:
 F-statistic: 27.11 on 9 and 524 DF,  p-value: < 2.2e-16 F-statistic: 27.11 on 9 and 524 DF,  p-value: < 2.2e-16
  
-anova(lm1, lm2) +summary(lm1)
-Analysis of Variance Table+
  
-Model 1: log(cps$wage) ~ education + south + sex + experience + union +  +Call: 
-    age + race + occupation + sector + marr +lm(formula = log(cps$wage) ~ ., data = cps) 
-Model 2: log(cps$wage(education south sex experience union  + 
-    age + race + occupation + sector + marr) age +Residuals: 
-  Res.Df    RSS Df Sum of Sq      F Pr(>F) +     Min       1Q   Median       3Q      Max  
-   523 101.17                            +-2.16246 -0.29163 -0.00469  0.29981  1.98248  
-2    524 101.28 -1  -0.11518 0.5954 0.4407+ 
 +Coefficients: 
 +             Estimate Std. Error t value Pr(>|t|    
 +(Intercept)  1.078596   0.687514   1.569 0.117291     
 +education    0.179366   0.110756   1.619 0.105949     
 +south       -0.102360   0.042823  -2.390 0.017187 *   
 +sex         -0.221997   0.039907  -5.563 4.24e-08 *** 
 +experience   0.095822   0.110799   0.865 0.387531     
 +union        0.200483   0.052475   3.821 0.000149 *** 
 +age         -0.085444   0.110730  -0.772 0.440671     
 +race         0.050406   0.028531   1.767 0.077865 .   
 +occupation  -0.007417   0.013109  -0.566 0.571761     
 +sector       0.091458   0.038736   2.361 0.018589 *   
 +marr         0.076611   0.041931   1.827 0.068259 .   
 +--- 
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 + 
 +Residual standard error: 0.4398 on 523 degrees of freedom 
 +Multiple R-squared:  0.3185, Adjusted R-squared:  0.3054  
 +F-statistic: 24.44 on 10 and 523 DF,  p-value: < 2.2e-16 
 + 
 +
 > </code> > </code>
  
 +====== regression test with factors ======
 <code> <code>
 +> cps$sex <- factor(cps$sex)
 +> cps$union <- factor(cps$union)
 +> cps$race <- factor(cps$race)
 +> cps$sector <- factor(cps$sector)
 +> cps$occupation <- factor(cps$occupation)
 +> cps$marr <- factor(cps$marr)
 +> str(cps)
 +'data.frame': 534 obs. of  11 variables:
 + $ education : int  8 9 12 12 12 13 10 12 16 12 ...
 + $ south     : int  0 0 0 0 0 0 1 0 0 0 ...
 + $ sex       : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 1 1 1 1 ...
 + $ experience: int  21 42 1 4 17 9 27 9 11 9 ...
 + $ union     : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...
 + $ wage      : num  5.1 4.95 6.67 4 7.5 ...
 + $ age       : int  35 57 19 22 35 28 43 27 33 27 ...
 + $ race      : Factor w/ 3 levels "1","2","3": 2 3 3 3 3 3 3 3 3 3 ...
 + $ occupation: Factor w/ 6 levels "1","2","3","4",..: 6 6 6 6 6 6 6 6 6 6 ...
 + $ sector    : Factor w/ 3 levels "0","1","2": 2 2 2 1 1 1 1 1 2 1 ...
 + $ marr      : Factor w/ 2 levels "0","1": 2 2 1 1 2 1 1 1 2 1 ...
 </code> </code>
  
 <code> <code>
 +> lm4 = lm(log(cps$wage) ~ . -age, data = cps)
 +> summary(lm4)
 +
 +Call:
 +lm(formula = log(cps$wage) ~ . - age, data = cps)
 +
 +Residuals:
 +     Min       1Q   Median       3Q      Max 
 +-2.36103 -0.28080  0.00362  0.27793  1.79594 
 +
 +Coefficients:
 +             Estimate Std. Error t value Pr(>|t|)    
 +(Intercept)  1.194821   0.181804   6.572 1.21e-10 ***
 +education    0.066603   0.010060   6.621 8.96e-11 ***
 +south       -0.093384   0.041931  -2.227  0.02637 *  
 +sex1        -0.216934   0.041844  -5.184 3.11e-07 ***
 +experience   0.009371   0.001725   5.431 8.63e-08 ***
 +union1       0.211506   0.051218   4.129 4.24e-05 ***
 +race2       -0.033928   0.099051  -0.343  0.73209    
 +race3        0.079851   0.057392   1.391  0.16472    
 +occupation2 -0.364444   0.091500  -3.983 7.78e-05 ***
 +occupation3 -0.210295   0.076175  -2.761  0.00597 ** 
 +occupation4 -0.383882   0.080990  -4.740 2.77e-06 ***
 +occupation5 -0.050664   0.072717  -0.697  0.48628    
 +occupation6 -0.265348   0.079969  -3.318  0.00097 ***
 +sector1      0.114857   0.054862   2.094  0.03678 *  
 +sector2      0.093138   0.096514   0.965  0.33499    
 +marr1        0.062211   0.041025   1.516  0.13002    
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 0.4278 on 518 degrees of freedom
 +Multiple R-squared:  0.3614, Adjusted R-squared:  0.3429 
 +F-statistic: 19.54 on 15 and 518 DF,  p-value: < 2.2e-16
 +
 +
 +
 </code> </code>
  
-<code>+<code>> lm5 = lm(log(cps$wage) ~ . -age -race, data = cps) 
 +> summary(lm5) 
 + 
 +Call: 
 +lm(formula = log(cps$wage) ~ . - age - race, data = cps) 
 + 
 +Residuals: 
 +     Min       1Q   Median       3Q      Max  
 +-2.34366 -0.28169 -0.00017  0.29179  1.81158  
 + 
 +Coefficients: 
 +             Estimate Std. Error t value Pr(>|t|)     
 +(Intercept)  1.224289   0.172070   7.115 3.73e-12 *** 
 +education    0.068838   0.009912   6.945 1.14e-11 *** 
 +south       -0.102588   0.041668  -2.462 0.014139 *   
 +sex1        -0.213602   0.041842  -5.105 4.65e-07 *** 
 +experience   0.009494   0.001723   5.510 5.65e-08 *** 
 +union1       0.202720   0.051009   3.974 8.06e-05 *** 
 +occupation2 -0.355381   0.091448  -3.886 0.000115 *** 
 +occupation3 -0.209820   0.076149  -2.755 0.006068 **  
 +occupation4 -0.385680   0.080855  -4.770 2.40e-06 *** 
 +occupation5 -0.047694   0.072746  -0.656 0.512351     
 +occupation6 -0.254277   0.079781  -3.187 0.001523 **  
 +sector1      0.111458   0.054845   2.032 0.042636 *   
 +sector2      0.099777   0.096481   1.034 0.301541     
 +marr1        0.065464   0.041036   1.595 0.111257     
 +--- 
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 + 
 +Residual standard error: 0.4283 on 520 degrees of freedom 
 +Multiple R-squared:  0.3573, Adjusted R-squared:  0.3412  
 +F-statistic: 22.24 on 13 and 520 DF,  p-value: < 2.2e-16 
 + 
 +
 </code> </code>
  
 +<code>> lm6 = lm(log(cps$wage) ~ . -age -race -occupation -marr -sector, data = cps)
 +> summary(lm6)
  
 +Call:
 +lm(formula = log(cps$wage) ~ . - age - race - occupation - marr - 
 +    sector, data = cps)
 +
 +Residuals:
 +     Min       1Q   Median       3Q      Max 
 +-2.13809 -0.28681 -0.00078  0.29376  1.96678 
 +
 +Coefficients:
 +             Estimate Std. Error t value Pr(>|t|)    
 +(Intercept)  0.731792   0.122217   5.988 3.94e-09 ***
 +education    0.094096   0.007942  11.848  < 2e-16 ***
 +south       -0.111761   0.042857  -2.608 0.009372 ** 
 +sex1        -0.231978   0.039202  -5.918 5.88e-09 ***
 +experience   0.011548   0.001680   6.875 1.75e-11 ***
 +union1       0.198360   0.051243   3.871 0.000122 ***
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 0.4433 on 528 degrees of freedom
 +Multiple R-squared:  0.3011, Adjusted R-squared:  0.2944 
 +F-statistic: 45.49 on 5 and 528 DF,  p-value: < 2.2e-16
 +
 +> </code>
multicolinearity.1545758501.txt.gz · Last modified: 2018/12/26 02:21 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki