Differences

This shows you the differences between two versions of the page.

--- c:ms:2023:w07_anova_note [2023/04/18 22:14] – [Post hoc] hkimscil
+++ c:ms:2023:w07_anova_note [2023/06/04 21:38] (current) – [Post hoc test] hkimscil
@@ Line 125: / Line 125: @@
 </code>
-====== Output ======
+===== ANOVA in R: Output =====
 <code>
 > #
@@ Line 417: / Line 418: @@
 >
 </code>
-====== R square or Eta ======
-SS toal
-  * = Y 변인만으로 Y를 예측했을 때의 오차의 제곱
-  * Y 변인만으로 = Y의 평균을 가지고 Y값을 예측한 것
-  * 학습 초기에 에러의 제곱의 합으로 설명된 것
-SS between
-  * X 변인 (independent variable) 정보가 고려 되었을 때
-  * 독립변인이 고려되었을 때 (됨으로써)
-  * 없어지는 SS total의 불확실 성
-  * 혹은 획득되는 <fc #ff0000>설명력</fc>
-SS error
-  * IV가 고려되었음에도 불구하고
-  * 끝까지 남는 error
-SS total = SS between + SS within
-SS between / SS total = IV 가 kicked in 되었을 때 없어지는 uncertainty = IV 의 설명력 = <fc #ff0000>R square value</fc>
-즉, IV로 uncertainty 가 얼마나 없어질까? 라는 아이디어
-====== Post hoc ======
+====== Post hoc test ======
 [[:post hoc test]]
 <code>
@@ Line 454: / Line 434: @@
 d.bc
-# se
+# mse (ms within) from the a.res.sum output
-# from the a.res.sum output
+# a.res.sum == summary(aov(values ~ group, data=comb3))
 a.res.sum
 # mse = 50
 mse <- 50
+# 혹은 fansy way from comb3 data.frame
+# 15 는 각 그룹의 df
+sse.ch <- sum(tapply(comb3$values, comb3$group, var)*15)
+sse.ch
+mse.ch <- sse.ch/45
+mse.ch
 se <- sqrt(mse/length(A))
@@ Line 485: / Line 471: @@
 ptukey(t.bc, nmeans=3, df=45, lower.tail = F)
-TukeyHSD(a.res)
+TukeyHSD(a.res, conf.level=.95)
+</code>
+<code>
+plot(TukeyHSD(a.res, conf.level=.95), las = 2)
+</code>
+<code>
+pairwise.t.test(comb3$values, comb3$group, p.adj = "bonf")
+</code>
+===== post hoc test: output =====
+<code>
+> m.a
+[1] 26
+> m.b
+[1] 24
+> m.c
+[1] 19
+>
+> d.ab <- m.a - m.b
+> d.ac <- m.a - m.c
+> d.bc <- m.b - m.c
+>
+> d.ab
+[1] 2
+> d.ac
+[1] 7
+> d.bc
+[1] 5
+>
+> # se
+> # from the a.res.sum output
+> a.res.sum
+            Df Sum Sq Mean Sq F value Pr(>F)
+group        2    416     208    4.16  0.022 *
+Residuals   45   2250      50
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+> # mse = 50
+> mse <- 50
+>
+> se <- sqrt(mse/length(A))
+>
+> # now t scores for two compared groups
+> t.ab <- d.ab / se
+> t.ac <- d.ac / se
+> t.bc <- d.bc / se
+>
+> t.ab
+[1] 1.131371
+> t.ac
+[1] 3.959798
+> t.bc
+[1] 2.828427
+>
+> # 이제 위의 점수를 .05 레벨에서 비교할 점수를 찾아 비교한다
+> # qtukey 펑션을 이용한다
+> t.crit <- qtukey( .95, nmeans = 3, df = 45)
+> t.crit
+[1] 3.427507
+>
+> # t.ac만이 큰 것을 알 수 있다.
+> # 따라서 a 와 c 가 서로 다른 그룹
+> # 즉, 1학년과 3학년이 서로 다른 그룹
+>
+> # 혹은 R이 보통 제시한 거꾸로 방법으로 보면
+> ptukey(t.ab, nmeans=3, df=45, lower.tail = F)
+[1] 0.7049466
+> ptukey(t.ac, nmeans=3, df=45, lower.tail = F)
+[1] 0.02012498
+> ptukey(t.bc, nmeans=3, df=45, lower.tail = F)
+[1] 0.123877
+>
+> TukeyHSD(a.res, conf.level=.95)
+  Tukey multiple comparisons of means
+% family-wise confidence level
+Fit: aov(formula = values ~ group, data = comb3)
+$group
+    diff        lwr       upr     p adj
+b-a   -2  -8.059034  4.059034 0.7049466
+c-a   -7 -13.059034 -0.940966 0.0201250
+c-b   -5 -11.059034  1.059034 0.1238770
+</code>
+{{:c:ms:2023:pasted:20230418-223608.png}}
+====== R square or Eta square ======
+SS toal
+  * = Y 변인만으로 Y를 예측했을 때의 오차의 제곱
+  * Y 변인만으로 = Y의 평균을 가지고 Y값을 예측한 것
+  * 학습 초기에 에러의 제곱의 합으로 설명된 것
+SS between
+  * X 변인 (independent variable) 정보가 고려 되었을 때
+  * 독립변인이 고려되었을 때 (됨으로써)
+  * 없어지는 SS total의 불확실 성
+  * 혹은 획득되는 <fc #ff0000>설명력</fc>
+SS error
+  * IV가 고려되었음에도 불구하고
+  * 끝까지 남는 error
+SS total = SS between + SS within
+SS between / SS total = IV 가 kicked in 되었을 때 없어지는 uncertainty = IV 의 설명력 = <fc #ff0000>R square value</fc>
+즉, IV로 uncertainty 가 얼마나 없어질까? 라는 아이디어
+이를 살펴보기 위해
+<code>
+ss.tot
+ss.bet
+r.sq <- ss.bet / ss.tot
+r.sq
+# then . . . .
+lm.res <- lm(values ~ group, data = comb3)
+summary(lm.res)
+anova(lm.res)
+</code>
+===== R square: output =====
+<code>
+> ss.tot
+[1] 2666
+> ss.bet
+[1] 416
+> r.sq <- ss.bet / ss.tot
+> r.sq
+[1] 0.156039
+>
+> # then . . . .
+>
+> lm.res <- lm(values ~ group, data = comb3)
+> summary(lm.res)
+Call:
+lm(formula = values ~ group, data = comb3)
+Residuals:
+    Min      1Q  Median      3Q     Max
+-16.020  -2.783   1.476   4.892  12.148
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)   26.000      1.768   14.71   <2e-16 ***
+groupb        -2.000      2.500   -0.80   0.4279
+groupc        -7.000      2.500   -2.80   0.0075 **
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 7.071 on 45 degrees of freedom
+Multiple R-squared:  0.156,	Adjusted R-squared:  0.1185
+F-statistic:  4.16 on 2 and 45 DF,  p-value: 0.02199
+> anova(lm.res)
+Analysis of Variance Table
+Response: values
+          Df Sum Sq Mean Sq F value  Pr(>F)
+group      2    416     208    4.16 0.02199 *
+Residuals 45   2250      50
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+>
+>
 </code>