r:dummy_variables_with_significant_interaction
This is an old revision of the document!
Table of Contents
Regression with two categorical variables (IVs): Two way anova
college <- read.csv("http://commres.net/wiki/_media/r/college.csv") attach(college) str(college) head(college) salary <- salary / 1000 public<-factor(public, c(0,1), labels=c('Private', 'Public')) location<-factor(location, c(1,2,3,4), labels=c('S', 'MW','NE', 'W')) m1 <- lm(salary~public+location) m2 <- lm(salary~public*location) summary(m1) summary(m2) a.m2 <- aov(salary~public*location) summary(a.m2) interaction.plot(x.factor = location, trace.factor = public, response = salary, fun = median, ylab = "salary", xlab = "location", col=c("red", "blue"), lty = 1, lwd=2, trace.label="public")
output
#################### > college <- read.csv("http://commres.net/wiki/_media/r/college.csv") > attach(college) The following object is masked from acne.re (pos = 36): id The following object is masked from acne.re (pos = 37): id > str(college) 'data.frame': 85 obs. of 6 variables: $ id : int 1 2 3 4 5 6 7 8 9 10 ... $ name : chr "Massachusetts Institute of Technology (MIT)" "Harvard University" "Dartmouth College" "Princeton University" ... $ salary : num 119000 121000 123000 123000 110000 112000 111000 117000 111000 104000 ... $ cost : int 189300 189600 188400 188700 194200 181900 191300 187600 180400 184900 ... $ public : int 0 0 0 0 0 0 0 0 0 0 ... $ location: int 3 3 3 3 3 2 3 1 3 3 ... > head(college) id name salary cost public location 1 1 Massachusetts Institute of Technology (MIT) 119000 189300 0 3 2 2 Harvard University 121000 189600 0 3 3 3 Dartmouth College 123000 188400 0 3 4 4 Princeton University 123000 188700 0 3 5 5 Yale University 110000 194200 0 3 6 6 University of Notre Dame 112000 181900 0 2 > salary <- salary / 1000 > public<-factor(public, c(0,1), labels=c('Private', 'Public')) > location<-factor(location, c(1,2,3,4), labels=c('S', 'MW','NE', 'W')) > m1 <- lm(salary~public+location) > m2 <- lm(salary~public*location) > summary(m1) Call: lm(formula = salary ~ public + location) Residuals: Min 1Q Median 3Q Max -17.15 -4.75 -0.35 2.85 31.67 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 99.56 1.90 52.30 < 2e-16 *** publicPublic -7.30 1.83 -3.99 0.00014 *** locationMW -2.40 2.52 -0.95 0.34366 locationNE 8.79 2.39 3.68 0.00042 *** locationW -10.93 2.49 -4.39 3.4e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 7.86 on 80 degrees of freedom Multiple R-squared: 0.587, Adjusted R-squared: 0.566 F-statistic: 28.4 on 4 and 80 DF, p-value: 1.06e-14 > summary(m2) Call: lm(formula = salary ~ public * location) Residuals: Min 1Q Median 3Q Max -11.19 -4.88 -0.65 2.49 27.55 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 99.492 2.006 49.60 < 2e-16 *** publicPublic -7.142 3.172 -2.25 0.0272 * locationMW -1.982 2.975 -0.67 0.5074 locationNE 11.393 2.537 4.49 2.5e-05 *** locationW -17.554 3.172 -5.53 4.1e-07 *** publicPublic:locationMW -0.913 4.500 -0.20 0.8398 publicPublic:locationNE -12.843 4.704 -2.73 0.0078 ** publicPublic:locationW 10.650 4.451 2.39 0.0192 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 6.95 on 77 degrees of freedom Multiple R-squared: 0.689, Adjusted R-squared: 0.661 F-statistic: 24.4 on 7 and 77 DF, p-value: <2e-16 > > > a.m2 <- aov(salary~public*location) > summary(a.m2) Df Sum Sq Mean Sq F value Pr(>F) public 1 2970 2970 61.50 2.0e-11 *** location 3 4057 1352 28.01 2.4e-12 *** public:location 3 1225 408 8.46 6.3e-05 *** Residuals 77 3718 48 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > > interaction.plot(x.factor = location, + trace.factor = public, + response = salary, + fun = median, + ylab = "salary", + xlab = "location", + col=c("red", "blue"), + lty = 1, + lwd=2, + trace.label="public") >
Regression with a continous + a categorical variables: ANCOVA
summary(mod<-lm(salary~cost + location)) anova(mod)
> summary(mod<-lm(salary~cost + location)) Call: lm(formula = salary ~ cost + location) Residuals: Min 1Q Median 3Q Max -20.244 -4.933 -0.572 3.162 29.939 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.78e+01 3.10e+00 28.34 < 2e-16 *** cost 6.05e-05 1.73e-05 3.50 0.00076 *** locationMW -2.80e+00 2.57e+00 -1.09 0.27885 locationNE 9.23e+00 2.42e+00 3.81 0.00027 *** locationW -1.05e+01 2.57e+00 -4.10 0.00010 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 8.02 on 80 degrees of freedom Multiple R-squared: 0.571, Adjusted R-squared: 0.549 F-statistic: 26.6 on 4 and 80 DF, p-value: 4.96e-14 > anova(mod) Analysis of Variance Table Response: salary Df Sum Sq Mean Sq F value Pr(>F) cost 1 2757 2757 42.9 5.1e-09 *** location 3 4072 1357 21.1 3.6e-10 *** Residuals 80 5141 64 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >
(Intercept) 8.78e+01 3.10e+00 28.34 < 2e-16 *** cost 6.05e-05 1.73e-05 3.50 0.00076 *** locationMW -2.80e+00 2.57e+00 -1.09 0.27885 locationNE 9.23e+00 2.42e+00 3.81 0.00027 *** locationW -1.05e+01 2.57e+00 -4.10 0.00010 ***
y hat ~ 87.8 + .00006*cost - 2.8 MW + 9.23 NE - 10.5 E
- S, MW, NE, W 중에서 S가 default
- y hat ~ 87.8 + .00006*cost
- MW:
- y hat ~ 87.8 - 2.8 + .00006*cost
- y hat ~ 85 + .00006*cost
- NE:
- y hat ~ 87.8 - 9.23 + .00006*cost
- y hat ~ 78.57 + .00006*cost
- E:
- y hat ~ 87.8 - 10.5 + .00006*cost
- y hat ~ 77.3 + .00006*cost
r/dummy_variables_with_significant_interaction.1685969932.txt.gz · Last modified: 2023/06/05 21:58 by hkimscil