Differences

This shows you the differences between two versions of the page.

--- anova [2020/05/20 15:25] – [Example] hkimscil
+++ anova [2022/09/30 09:02] (current) – [SS within] hkimscil
@@ Line 189: / Line 189: @@
 $$
-\text{SS} & = & \sum X_i^2 - \frac{(\sum {X_i)^2}}{n}
+\text{SS} =  \sum X_i^2 - \frac{(\sum {X_i)^2}}{n}
 $$
 ==== SS within ====
 두 번째로 알아봐야 할 것은 각각의 그룹 내에서 그룹 멤버들이 평균에서 얼마나 흩어져 있는가이다. 이 분산값은 이전에 소개된 분산값의 공식을 이용해서 구할 수 있다. 단, 여기서 비교하는 그룹이 세 개이므로 SS 값은 모두 3개를 구할 수 있으므로, SS<sub>within</sub>값은 각각의 그룹 분산을 모두 더한 값이다.
-$$
+$$ SS_{within} = \sum {SS_{each group}} $$
-SS_{within}} = \sum {SS_{each group}
-$$
 위에서 각각의 SS값은 미리 구해 두었으므로, 이를 계산하면,
 $$
-SS_{within} & = & 6 + 4 + 6 = 16
+SS_{within} = 6 + 4 + 6 = 16
 $$
@@ Line 375: / Line 373: @@
 ====== Post hoc test ======
 [[Post hoc test]]
-<code>> adata <- read.csv("https://datascienceplus.com/wp-content/uploads/2017/08/tyre.csv")
+<code>> adata <- read.csv("https://datascienceplus.com/wp-content/uploads/2017/08/tyre.csv", fileEncoding="UTF-8-BOM")
 > adata
         Brands  Mileage
@@ Line 474: / Line 472: @@
 ====== F and t value ======
 $$ F = t^{2}$$
-<code>> td <- read.csv("D:/Users/Hyo/Cs-Kant/CS/Rdata/t-test.csv")
+<code>> td <- read.csv("D:/Users/Hyo/Cs-Kant/CS/Rdata/t-test.csv", fileEncoding="UTF-8-BOM")
 > head(td)
   gender tmobconv out in. mobpeo
@@ Line 518: / Line 516: @@
 ====== Example ======
+가설. 단어맞히기 게임에서 첫글자를 힌트로 주거나, 마지막 글자를 힌트로 주거나, 힌트를 주지 않은 세 그룹 간에 틀린 단어의 숫자에 차이가 있을 것이다.
+[[anova/ex01]]
 |  |First Letter \\ Condition 1 \\ X<sub>1</sub>|Last Letter \\ Condition 2 \\ X<sub>2</sub>|No Letter \\ Condition 3 \\ X<sub>3</sub> |
 | | 15 | 21 | 28 |
@@ Line 568: / Line 570: @@
 \end{eqnarray*}
+|          | SS   | df    | MS  | F   |
+| between  | $e = \;$ 420   | $e' = \;$ 2 | 210  |  210/40 = 5.25  |
+| within   | $f = \;$ 1080  | $f' = \;$ 27 | 40  |   |
+| total    | $g = \;$ 1500  | $g' = \;$ 29 |   |   |
+F<sub>crit</sub>(2, 27) =  3.35
+F<sub>cal</sub> = 5.25
+F<sub>cal</sub> > F<sub>crit</sub> 이므로 3집단 간의 평균은 통계학적으로 의미가 있는 차이를 가지고 있다.
 ====== Example 2 ======
-<code>x1 <- c(15, 20, 14, 13, 18, 16, 13, 12, 18, 41)
+<code>
+x1 <- c(15, 20, 14, 13, 18, 16, 13, 12, 18, 41)
 x2 <- c(21, 25, 29, 18, 26, 22, 26, 24, 28, 21)
 x3 <- c(28, 30, 32, 28, 26, 30, 25, 36, 20, 15)
 </code>
-<code>> data.frame(x1,x2,x3)
+<code>> xc <- data.frame(x1,x2,x3)
    x1 x2 x3
   15 21 28
@@ Line 623: / Line 635: @@
 <code>
-> colnames(xs[1]) <- "wrong"
+> colnames(xs)  <- c("wrong", "condition")
-> colnames(xs[2]) <- "cond"
 </code>
-<code># cf
+<code>
+# cf
 # lengthofelements <- length(x1)
 # varofvariable <- var(x1)</code>
 <code>
-df_x1
+df.total <- length(xs$wrong) - 1
-df_x2
+ss.total <- var(xs$wrong)*df_tot
-df_x3
+var.total <- ss.total/df.total
+var.total.r <- var(xs$wrong)
-ss_x1
+df.x1 <- length(x1)-1
-ss_x2
+df.x2 <- length(x2)-1
-ss_x3
+df.x3 <- length(x3)-1
+ss.x1 <- var(x1)*df.x1
+ss.x2 <- var(x2)*df.x2
+ss.x3 <- var(x3)*df.x3
-df_bet
+ss.within <- ss.x1 + ss.x2 + ss.x3
-ss_bet
+df.within <- df.x1 + df.x2 + df.x3
+ss.between <- ss.total - ss.within
+df.between <- df.total - df.within
-df_tot
+ms.between <- ss.between/df.between
-ss_tot
+ms.within <- ss.within/df.within
+f.value <- ms.between/ms.within
-df_with
+ss.between
-ss_with
+df.between
-df_bet
+ss.within
-ss_bet
+df.within
-</code>
+ms.between
+ms.within
-<code>
+f.value
-df_tot <- length(xs$ind) - 1
+[1] 5.25
-ss_tot <- var(xs$values)*df_tot
-var_tot <- var(xs$values)
+f.crit <- qf(.95, df1=2, df2=27) ## p=.05 level에서 F(2,27)의 값은 qf 펑션으로 구합니다.
+f.crit
+[1] 3.354131
+####################################
+## f.value가 f.crit 값보다 크므로
+## 세 그룹 간에 차이가 있다는 가설을
+## 받아들인다. (세 그룹 간에 차이가
+## 없다는 영가설을 부정한다)
+####################################
-df_x1 <- length(x1)-1
-df_x2 <- length(x2)-1
-df_x3 <- length(x3)-1
-ss_x1 <- var(x1)*df_x1
-ss_x2 <- var(x2)*df_x2
-ss_x3 <- var(x3)*df_x3
 </code>
 ====== E.G. 1 (R) ======