Differences

This shows you the differences between two versions of the page.

--- interaction_effects_in_regression_analysis [2023/06/12 08:36] – [Analysis again] hkimscil
+++ interaction_effects_in_regression_analysis [2025/06/16 13:00] (current) – [E.g.2] hkimscil
@@ Line 96: / Line 96: @@
 ====== Two category variables ======
-<code>> set.seed(12)
+<code>
+> set.seed(12)
 > f1<-gl(n=2,k=30,labels=c("Low","High"))
 > f2<-as.factor(rep(c("A","B","C"),times=20))
@@ Line 272: / Line 273: @@
   - f1High:f2C : 질소가 High이고 온도도 High인 상태 -1.16 감소한다.
-<code>interact_plot(mod2, pred = "f1", modx = "f2")</code>
+<code>
-{{:r:interaction.effects.2.jpeg}}
+> interact_plot(mod2, pred = "f1", modx = "f2")
+</code>
+{{:pasted:20250616-072703.png?400}}
+<code>
+> interact_plot(mod2, pred = "f2", modx = "f1")
+</code>
+{{:pasted:20250616-072946.png?400}}
+:r:interaction.effects.2.jpeg
 ====== Two continuous variables ======
 <code># third case interaction between two continuous variables
@@ Line 379: / Line 387: @@
   - (위의 마지막 식에서) x1:x2 = x1*x2 : 질소량이 1씩 증가할 때 마다, 온도의 영향력은 1.5식 증가한다. 예를 들면 질소량이 0일 경우, 온도와 작물 간의 기울기는 약 2인데, 질소의 양이 1 증가하고 온도가 1 증가하면 기울기는 2 + 1.5 = 3.5가 된다.
   - <WRAP box><code>
-x1=1,x2=1: 0.97 + *3.5 x1 + -1 x2
+# 0.97 = 1 로 보면
-x1=2,x2=2: 0.97 + *5.0 x1 + -1 x2
+x2=1: 0.97 + *3.5 x1 + -1 (1=x2)
-x1=3,x2=3: 0.97 + *6.5 x1 + -1 x2
++ 3.5 x1
-x1=4,x2=4: 0.97 + *8.0 x1 + -1 x2
+x2=2: 0.97 + *5.0 x1 + -1 (2=x2)
-x1=5,x2=5: 0.97 + *9.5 x1 + -1 x2</code>
+      -1 + 5.0 x1
+x2=3: 0.97 + *6.5 x1 + -1 (3=x2)
+      -2 + 6.5 x1
+x2=4: 0.97 + *8.0 x1 + -1 (4=x2)
+      -3 + 8.0 x1
+x2=5: 0.97 + *9.5 x1 + -1 (5=x2)
+      -4 + 9.5 x1
+</code>
 <code>
 *(1.995115 + 1.499595*x2):
@@ Line 405: / Line 420: @@
 ====== E.g.2  ======
 {{:r:states.rds}}
-Download the data file to c:/Rstatistics first. Then
+<code>
-do
+# states.data <- readRDS("c:/Rstatistics/dataSets/states.rds")
-<code>states.data <- readRDS("c:/Rstatistics/dataSets/states.rds") </code>
+states.data <- readRDS(url("http://commres.net/wiki/_media/r/states.rds"))
+</code>
 Or, read the above data file directly
@@ Line 455: / Line 471: @@
 </code>
 <code>
-> data.info <- data.frame(attributes(data)[c("names", "var.labels")])
+> data.info <- data.frame(attributes(states.data)[c("names", "var.labels")])
 > # attributes(data) reveals various attributes of the data file,
 > # which contains variable names and labels.
@@ Line 669: / Line 685: @@
 s.n.i <- summary(n.i)
-s.n12i
+s.n.12i
 # y hat ~ 1048 + -0.003917 x1 + -3.809 x2 + 0.000249 x1 x2
@@ Line 677: / Line 693: @@
 e.mm1 <- mean(expense)-sd(expense)
 e.m0 <- mean(expense)
-e.mp1 <- mean(expxense)+sd(expense)
+e.mp1 <- mean(expense)+sd(expense)
 e.mp2 <- mean(expense)+(2*sd(expense))
@@ Line 706: / Line 722: @@
 </code>
+<code>
+> n.1 <- lm(csat~expense)
+> n.2 <- lm(csat~percent)
+> n.12 <- lm(csat~expense+percent)
+> n.12i <- lm(csat~expense*percent)
+> n.1i <- lm(csat~expense+expense:percent)
+> n.2i <- lm(csat~percent+expense:percent)
+> n.2i.temp <- lm(csat~percent+percent:expense)
+> n.i <- lm(csat~expense:percent)
+>
+> s.n.1 <- summary(n.1)
+> s.n.2 <- summary(n.2)
+> s.n.12 <- summary(n.12)
+> s.n.12i <- summary(n.12i)
+> s.n.1i <- summary(n.1i)
+> s.n.2i <- summary(n.2i)
+> s.n.2i.temp <- summary(n.2i.temp)
+> s.n.i <- summary(n.i)
+>
+> s.n.12i
+Call:
+lm(formula = csat ~ expense * percent)
+Residuals:
+   Min     1Q Median     3Q    Max
+-65.36 -19.61  -3.05  17.53  76.18
+Coefficients:
+                 Estimate Std. Error t value Pr(>|t|)
+(Intercept)      1.05e+03   3.56e+01   29.41  < 2e-16 ***
+expense         -3.92e-03   7.76e-03   -0.51    0.616
+percent         -3.81e+00   7.04e-01   -5.41  2.1e-06 ***
+expense:percent  2.49e-04   1.31e-04    1.90    0.063 .
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 30.8 on 47 degrees of freedom
+Multiple R-squared:  0.801,	Adjusted R-squared:  0.788
+F-statistic: 63.1 on 3 and 47 DF,  p-value: <2e-16
+>
+> # y hat ~ 1048 + -0.003917 x1 + -3.809 x2 + 0.000249 x1 x2
+> # y hat ~ 1048 + -0.003917 x1 + (-3.809 + 0.000249 x1) x2
+> # x2를 (percent를) 중심으로 보기
+>
+> e.mm1 <- mean(expense)-sd(expense)
+> e.m0 <- mean(expense)
+> e.mp1 <- mean(expense)+sd(expense)
+> e.mp2 <- mean(expense)+(2*sd(expense))
+>
+> # x1의 case가 4가지 (holding constants)
+> k <- c(e.mm1, e.m0, e.mp1, e.mp2)
+> ic <- 1048 + (-0.003917*k)
+> slp <- -3.809 + (0.000249*k)
+> ic
+[1] 1033 1027 1022 1017
+> slp
+[1] -2.854 -2.505 -2.156 -1.807
+>
+> # y hat ~ 1032.979 - 2.85 x2
+> # y hat ~ 1027.491 - 2.51 x2
+> # y hat ~ 1022.002 - 2.16 x2
+> # y hat ~ 1016.514 - 1.81 x2
+>
+> interact_plot(n.12i,
++               pred = "percent",
++               modx = "expense",
++               modx.values = k)
+> # or
+> mne <- min(expense)
+> mxe <- max(expense)
+> kk <- seq(mne, mxe, by = 2000)
+> interact_plot(n.12i,
++               pred = "percent",
++               modx = "expense",
++               modx.values = kk)
+>
+</code>
 아래 그림에서처럼 대학지원 퍼센티지가 높아지면 성적이 떨어지는 경향을 보이는데, 이 경향은 해당 주가 얼마나 sat교육에 돈을 투자하는가에 따라서 달라진다. 많이 투자하는 경우에는 지원율이 높아도 떨어지는 비율이 지원율이 낮은 경우보다 현저히 낮다.
 {{:pasted:20230607-004047.png}}
+===== 언제 interaction effect를 분석에 넣는가? =====
+interaction effects가 significant할 때에 넣는다
+significant하지 않을 때에는 additive model을 (+사인 모델) 사용한다.
 ===== One categorical IV =====
@@ Line 816: / Line 917: @@
 [5] "Murder"     "HS Grad"    "Frost"      "Area"
 </code>
+아래의 경우 interaction effect는 중요한 의미를 갖는다. additive model에서는 murder가 중요한 역할을 하지 않지만, interactive model에서는 Illiteracy와 결합하여 중요한 역할을 하는 것으로 해석될 수 있다.
-<code>fiti <- lm(Income ~ Illiteracy * Murder, data = as.data.frame(state.x77))
+<code>
+fit <- lm(Income ~ Illiteracy + Murder, data = as.data.frame(state.x77))
+fiti <- lm(Income ~ Illiteracy * Murder, data = as.data.frame(state.x77))
+summary(fit)
 summary(fiti)
+</code>
+<code>
+> fit <- lm(Income ~ Illiteracy + Murder, data = as.data.frame(state.x77))
+> fiti <- lm(Income ~ Illiteracy * Murder, data = as.data.frame(state.x77))
+> summary(fit)
+Call:
+lm(formula = Income ~ Illiteracy + Murder, data = as.data.frame(state.x77))
+Residuals:
+   Min     1Q Median     3Q    Max
+-880.9 -397.3  -51.3  333.1 1960.7
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)   4890.5      187.6   26.06   <2e-16 ***
+Illiteracy    -548.7      184.6   -2.97   0.0046 **
+Murder          25.4       30.5    0.83   0.4089
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 560 on 47 degrees of freedom
+Multiple R-squared:  0.203,	Adjusted R-squared:  0.169
+F-statistic: 5.98 on 2 and 47 DF,  p-value: 0.00486
+> summary(fiti)
 Call:
 lm(formula = Income ~ Illiteracy * Murder, data = as.data.frame(state.x77))
 Residuals:
-    Min      1Q  Median      3Q     Max
+   Min     1Q Median     3Q    Max
--955.20 -325.99   10.66  299.96 1892.12
+-955.2 -326.0   10.7  300.0 1892.1
 Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
-(Intercept)        3822.61     405.33   9.431 2.54e-12 ***
+(Intercept)         3822.6      405.3    9.43  2.5e-12 ***
-Illiteracy          617.34     434.85   1.420  0.16245
+Illiteracy           617.3      434.9    1.42   0.1624
-Murder              146.82      50.33   2.917  0.00544 **
+Murder               146.8       50.3    2.92   0.0054 **
-Illiteracy:Murder  -117.10      40.13  -2.918  0.00544 **
+Illiteracy:Murder   -117.1       40.1   -2.92   0.0054 **
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 520.1 on 46 degrees of freedom
+Residual standard error: 520 on 46 degrees of freedom
-Multiple R-squared:  0.3273,	Adjusted R-squared:  0.2834
+Multiple R-squared:  0.327,	Adjusted R-squared:  0.283
-F-statistic: 7.461 on 3 and 46 DF,  p-value: 0.000359
+F-statistic: 7.46 on 3 and 46 DF,  p-value: 0.000359
 >
 </code>
 <code>> interact_plot(fiti, pred = "Illiteracy", modx = "Murder")</code>
@@ Line 848: / Line 978: @@
 {{:r:state.x77.points.jpeg?600}}
-<code>fitiris <- lm(Petal.Length ~ Petal.Width * Species, data = iris)
-interact_plot(fitiris, pred = "Petal.Width", modx = "Species")</code>
-{{:r:fitiris.jpeg?600}}
 ====== Eg. 4 ======