multicolinearity
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
multicolinearity [2016/04/27 06:36] – hkimscil | multicolinearity [2018/12/26 02:49] (current) – [regression test with factors] hkimscil | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | {{keywords> | + | ====== |
- | ====== | + | required library: |
- | 변인들 간의 상관관계가 극한 정도로 이루어질 때 multicollinearity가 있다고 한다. 예를 들어 IQ score와 수학점수는 상당한 [[Correlation|상관관계]]에 있을 것이다. 이 두 변인은 서로 비숫한 대상(현상)을 측정한 것이기 때문이다. 이 두 변인이 독립변인으로 regression과 같은 test에 사용된다면, | + | * corrplot |
+ | * mctest | ||
+ | * omcdiag | ||
+ | * imcdiag | ||
- | 만약에 IQ점수와 IQ점수의 세부점수들을 (지각력, 공간력, 수학력, 언어능력이라고 가정하면) 간의 관계는 사실 동일한 것이라고 하겠다. IQ점수는 이 세부점수를 모두 더한 값이기 때문이다. 이와 같은 상황을 Singularity라고 한다. | ||
- | [[:Singularity]] \\ | + | < |
- | [[: | + | > cps <- read.csv(" |
+ | </ | ||
- | {{tag>multicolinearity singularity regression preassumption statistics | + | < |
+ | ' | ||
+ | $ education : int 8 9 12 12 12 13 10 12 16 12 ... | ||
+ | $ south : int 0 0 0 0 0 0 1 0 0 0 ... | ||
+ | $ sex : int 1 1 0 0 0 0 0 0 0 0 ... | ||
+ | $ experience: int 21 42 1 4 17 9 27 9 11 9 ... | ||
+ | $ union : int 0 0 0 0 0 1 0 0 0 0 ... | ||
+ | $ wage : num 5.1 4.95 6.67 4 7.5 ... | ||
+ | $ age : int 35 57 19 22 35 28 43 27 33 27 ... | ||
+ | $ race : int 2 3 3 3 3 3 3 3 3 3 ... | ||
+ | $ occupation: int 6 6 6 6 6 6 6 6 6 6 ... | ||
+ | $ sector | ||
+ | $ marr : int 1 1 0 0 1 0 0 0 1 0 ... | ||
+ | > head(cps) | ||
+ | > head(cps) | ||
+ | education south sex experience union wage age race occupation sector marr | ||
+ | 1 | ||
+ | 2 | ||
+ | 3 12 | ||
+ | 4 12 | ||
+ | 5 12 | ||
+ | 6 13 | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | > lm1 = lm(log(cps$wage) ~., data = cps) | ||
+ | > summary(lm1) | ||
+ | |||
+ | Call: | ||
+ | lm(formula = log(cps$wage) ~ ., data = cps) | ||
+ | |||
+ | Residuals: | ||
+ | | ||
+ | -2.16246 -0.29163 -0.00469 | ||
+ | |||
+ | Coefficients: | ||
+ | | ||
+ | (Intercept) | ||
+ | education | ||
+ | south | ||
+ | sex | ||
+ | experience | ||
+ | union 0.200483 | ||
+ | age | ||
+ | race | ||
+ | occupation | ||
+ | sector | ||
+ | marr | ||
+ | --- | ||
+ | Signif. codes: | ||
+ | |||
+ | Residual standard error: 0.4398 on 523 degrees of freedom | ||
+ | Multiple R-squared: | ||
+ | F-statistic: | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | plot(lm1) | ||
+ | </ | ||
+ | {{lm1.plot1.png? | ||
+ | {{lm1.plot3.png? | ||
+ | |||
+ | |||
+ | <code> | ||
+ | > library(corrplot) | ||
+ | > cps.cor = cor(cps) | ||
+ | > corrplot.mixed(cps.cor, | ||
+ | </ | ||
+ | {{cps.corrplot.png? | ||
+ | |||
+ | < | ||
+ | > install.packages(" | ||
+ | > library(mctest) | ||
+ | > omcdiag(cps[, | ||
+ | |||
+ | Call: | ||
+ | omcdiag(x = cps[, c(-6)], y = cps$wage) | ||
+ | |||
+ | |||
+ | Overall Multicollinearity Diagnostics | ||
+ | |||
+ | MC Results detection | ||
+ | Determinant |X' | ||
+ | Farrar Chi-Square: | ||
+ | Red Indicator: | ||
+ | Sum of Lambda Inverse: 10068.8439 | ||
+ | Theil' | ||
+ | Condition Number: | ||
+ | |||
+ | 1 --> COLLINEARITY is detected by the test | ||
+ | 0 --> COLLINEARITY is not detected by the test | ||
+ | |||
+ | > | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | > imcdiag(cps[, | ||
+ | |||
+ | Call: | ||
+ | imcdiag(x = cps[, c(-6)], y = cps$wage) | ||
+ | |||
+ | |||
+ | All Individual Multicollinearity Diagnostics Result | ||
+ | |||
+ | | ||
+ | education | ||
+ | south | ||
+ | sex | ||
+ | experience 5184.0939 0.0002 301771.2445 340140.5368 0.0139 5302.4188 | ||
+ | union | ||
+ | age 4645.6650 0.0002 270422.7164 304806.1391 0.0147 4751.7005 | ||
+ | race 1.0371 0.9642 | ||
+ | occupation | ||
+ | sector | ||
+ | marr 1.0961 0.9123 | ||
+ | |||
+ | 1 --> COLLINEARITY is detected by the test | ||
+ | 0 --> COLLINEARITY is not detected by the test | ||
+ | |||
+ | education , south , experience , age , race , occupation , sector , marr , coefficient(s) are non-significant may be due to multicollinearity | ||
+ | |||
+ | R-square of y on all x: 0.2805 | ||
+ | |||
+ | * use method argument to check which regressors may be the reason of collinearity | ||
+ | =================================== | ||
+ | > | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | > round(pcor(cps[, | ||
+ | | ||
+ | education | ||
+ | south -0.0318 | ||
+ | sex | ||
+ | experience | ||
+ | union -0.0075 -0.0975 -0.1201 | ||
+ | age | ||
+ | race 0.0172 -0.1112 | ||
+ | occupation | ||
+ | sector | ||
+ | marr | ||
+ | |||
+ | </ | ||
+ | |||
+ | < | ||
+ | > lm2 = lm(log(cps$wage) ~ . -age , data = cps) | ||
+ | > summary(lm2) | ||
+ | |||
+ | Call: | ||
+ | lm(formula = log(cps$wage) ~ . - age, data = cps) | ||
+ | |||
+ | Residuals: | ||
+ | | ||
+ | -2.16044 -0.29073 -0.00505 | ||
+ | |||
+ | Coefficients: | ||
+ | | ||
+ | (Intercept) | ||
+ | education | ||
+ | south | ||
+ | sex | ||
+ | experience | ||
+ | union 0.199987 | ||
+ | race | ||
+ | occupation | ||
+ | sector | ||
+ | marr | ||
+ | --- | ||
+ | Signif. codes: | ||
+ | |||
+ | Residual standard error: 0.4397 on 524 degrees of freedom | ||
+ | Multiple R-squared: | ||
+ | F-statistic: | ||
+ | |||
+ | > summary(lm1) | ||
+ | |||
+ | Call: | ||
+ | lm(formula = log(cps$wage) ~ ., data = cps) | ||
+ | |||
+ | Residuals: | ||
+ | | ||
+ | -2.16246 -0.29163 -0.00469 | ||
+ | |||
+ | Coefficients: | ||
+ | | ||
+ | (Intercept) | ||
+ | education | ||
+ | south | ||
+ | sex | ||
+ | experience | ||
+ | union 0.200483 | ||
+ | age | ||
+ | race | ||
+ | occupation | ||
+ | sector | ||
+ | marr | ||
+ | --- | ||
+ | Signif. codes: | ||
+ | |||
+ | Residual standard error: 0.4398 on 523 degrees of freedom | ||
+ | Multiple R-squared: | ||
+ | F-statistic: | ||
+ | |||
+ | > | ||
+ | > </ | ||
+ | |||
+ | ====== regression test with factors ====== | ||
+ | < | ||
+ | > cps$sex <- factor(cps$sex) | ||
+ | > cps$union <- factor(cps$union) | ||
+ | > cps$race <- factor(cps$race) | ||
+ | > cps$sector <- factor(cps$sector) | ||
+ | > cps$occupation <- factor(cps$occupation) | ||
+ | > cps$marr <- factor(cps$marr) | ||
+ | > str(cps) | ||
+ | ' | ||
+ | $ education : int 8 9 12 12 12 13 10 12 16 12 ... | ||
+ | $ south : int 0 0 0 0 0 0 1 0 0 0 ... | ||
+ | $ sex : Factor w/ 2 levels " | ||
+ | $ experience: int 21 42 1 4 17 9 27 9 11 9 ... | ||
+ | $ union : Factor w/ 2 levels " | ||
+ | $ wage : num 5.1 4.95 6.67 4 7.5 ... | ||
+ | $ age : int 35 57 19 22 35 28 43 27 33 27 ... | ||
+ | $ race : Factor w/ 3 levels " | ||
+ | $ occupation: Factor w/ 6 levels " | ||
+ | $ sector | ||
+ | $ marr : Factor w/ 2 levels " | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | > lm4 = lm(log(cps$wage) ~ . -age, data = cps) | ||
+ | > summary(lm4) | ||
+ | |||
+ | Call: | ||
+ | lm(formula = log(cps$wage) ~ . - age, data = cps) | ||
+ | |||
+ | Residuals: | ||
+ | | ||
+ | -2.36103 -0.28080 | ||
+ | |||
+ | Coefficients: | ||
+ | | ||
+ | (Intercept) | ||
+ | education | ||
+ | south | ||
+ | sex1 -0.216934 | ||
+ | experience | ||
+ | union1 | ||
+ | race2 | ||
+ | race3 0.079851 | ||
+ | occupation2 -0.364444 | ||
+ | occupation3 -0.210295 | ||
+ | occupation4 -0.383882 | ||
+ | occupation5 -0.050664 | ||
+ | occupation6 -0.265348 | ||
+ | sector1 | ||
+ | sector2 | ||
+ | marr1 0.062211 | ||
+ | --- | ||
+ | Signif. codes: | ||
+ | |||
+ | Residual standard error: 0.4278 on 518 degrees of freedom | ||
+ | Multiple R-squared: | ||
+ | F-statistic: | ||
+ | |||
+ | > | ||
+ | |||
+ | </ | ||
+ | |||
+ | < | ||
+ | > summary(lm5) | ||
+ | |||
+ | Call: | ||
+ | lm(formula = log(cps$wage) ~ . - age - race, data = cps) | ||
+ | |||
+ | Residuals: | ||
+ | | ||
+ | -2.34366 -0.28169 -0.00017 | ||
+ | |||
+ | Coefficients: | ||
+ | | ||
+ | (Intercept) | ||
+ | education | ||
+ | south | ||
+ | sex1 -0.213602 | ||
+ | experience | ||
+ | union1 | ||
+ | occupation2 -0.355381 | ||
+ | occupation3 -0.209820 | ||
+ | occupation4 -0.385680 | ||
+ | occupation5 -0.047694 | ||
+ | occupation6 -0.254277 | ||
+ | sector1 | ||
+ | sector2 | ||
+ | marr1 0.065464 | ||
+ | --- | ||
+ | Signif. codes: | ||
+ | |||
+ | Residual standard error: 0.4283 on 520 degrees of freedom | ||
+ | Multiple R-squared: | ||
+ | F-statistic: | ||
+ | |||
+ | > | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | > summary(lm6) | ||
+ | |||
+ | Call: | ||
+ | lm(formula = log(cps$wage) ~ . - age - race - occupation - marr - | ||
+ | sector, data = cps) | ||
+ | |||
+ | Residuals: | ||
+ | | ||
+ | -2.13809 -0.28681 -0.00078 | ||
+ | |||
+ | Coefficients: | ||
+ | | ||
+ | (Intercept) | ||
+ | education | ||
+ | south | ||
+ | sex1 -0.231978 | ||
+ | experience | ||
+ | union1 | ||
+ | --- | ||
+ | Signif. codes: | ||
+ | |||
+ | Residual standard error: 0.4433 on 528 degrees of freedom | ||
+ | Multiple R-squared: | ||
+ | F-statistic: | ||
+ | |||
+ | > </ |
multicolinearity.1461708365.txt.gz · Last modified: 2016/04/27 06:36 by hkimscil