Differences

This shows you the differences between two versions of the page.

--- b:head_first_statistics:estimating_populations_and_samples [2024/11/05 13:01] – [Sampling distribution of sample mean] hkimscil
+++ b:head_first_statistics:estimating_populations_and_samples [2024/11/11 08:23] (current) – [Recap] hkimscil
@@ Line 119: / Line 119: @@
 population: gumball의 25%가 red라고 할 때,
 하나의 샘플을 뽑는다고 가정할 때의 기대값과 분산값은 무엇인가?
+<WRAP box>
 Bernoulli distribution에 따르면,
 하나의 검볼을 뽑을 때, 이것이 red인지 아닌지에 대한 기대값과 분산값은
@@ Line 130: / Line 130: @@
 위의 상황에서 100번 independent trial을 통해서 구한 평균과 분산값은:
 $X \sim B(100, 1/4)$의 분포를 따른다고 할 때,
+</WRAP>
+<WRAP box>
+혹은 위의 분포는 이항분포이므로 $X ~ B(n, p)$ 에서 $E(X) = np$; $V(X) = npq$ 이다.
+</WRAP>
 \begin{eqnarray*}
 E(X) & = & n * p = 100 * 1/4 = 25 \\
@@ Line 136: / Line 142: @@
 \end{eqnarray*}
-이 때 $n = 100$일때 각각의 시도에서의 (trial) proportion 값은 ($\hat{P}$):
+위와 같이 $n = 100$ 일때 각각의 시도에서의 (trial) proportion 값은 ($\hat{P}$), 즉
-$X_{i} = $ Red color gumball
+\begin{eqnarray*}
+X_{i} & = & \text{the number of red gumball,} \\
+n & = & 100
+\end{eqnarray*} 조건에서의 proportion (비율) 값은
 \begin{eqnarray*}
-n = 100, \\
 \hat{P_{1}} & = \frac{X_{1}}{n} = 0.34, (X_{1} = 34) \\
-\hat{P_{2}} & = \frac{X_{2}}{n} = 0.43, (X_{2} = 43) \\
+\hat{P_{2}} & = \frac{X_{2}}{n} = 0.23, (X_{2} = 23) \\
-\hat{P_{3}} & = \frac{X_{3}}{n} = 0.32, (X_{3} = 32) \\
+\hat{P_{3}} & = \frac{X_{3}}{n} = 0.22, (X_{3} = 22) \\
-\hat{P_{4}} & = \frac{X_{4}}{n} = 0.42, (X_{4} = 42) \\
+\hat{P_{4}} & = \frac{X_{4}}{n} = 0.21, (X_{4} = 21) \\
-\cdots \cdots \cdots \\
+& \cdots \cdots \cdots \cdots \cdots \\
-\hat{P_{k}} & = \frac{X_{k}}{n} = 0.24, (X_{1} = 24) \\
+\hat{P_{k}} & = \frac{X_{k}}{n} = 0.24, (X_{k} = 24) \\
 \end{eqnarray*}
-즉, $X \sim B(n, p)$ 일 때, sample의 확률 $P_{s} = \dfrac{X}{n}$를 따른다 ($X$ = red gumball이 나온 갯수, $n$ = sample 크기).
+즉, $X \sim B(n, p)$ 일 때, sample의 비율은 $P_{s} = \dfrac{X}{n}$를 따른다 ($X$ = red gumball이 나온 갯수, $n$ = sample 크기).
 {{:b:head_first_statistics:pasted:20191126-073028.png}}
-위의 sampling을 계속한다면 (1)~(6)과 같은 결과를 의미한다 (아래 그림 참조).
+위의 sampling을 계속한다면 아래 그림과 같은 결과를 의미한다.
 {{:b:head_first_statistics:pasted:20191126-073652.png}}
-이렇게 계속 샘플링을 하여 그 확률(probability)를 구한다고 하면 즉,
+이렇게 계속 샘플링을 하여 얻은 비율의 $P_{s}$ 기댓값을 ($E(P_{s})$) 구한다고 하면;
-n = 100 개의 gumball을 sampling하여 얻는 Red gumball의 비율:
 \begin{eqnarray*}
@@ Line 165: / Line 172: @@
 \end{eqnarray*}
-아래에서 B(100, 1/4)일 때 random 하게 100번의 샘플링을 해서 얻는 Red gumball의 숫자
+아래는 위의 시뮬레이션이다.
+  * $X ~ B(100, 1/4)$의 이항분포에서 (n=100, p=1/4)
+  * random 하게 1000번의 (k=1000) 샘플링을 해서
+  * 얻는 Red gumball의 숫자
 <code>
 > set.seed(101)
-> rbinom(100, 100, 1/4)
+> k <- 1000
-  [1] 24 18 27 27 22 23 26 23 26 25 30 27 28 32 24 26 29 22 24 18 27 33 22 27 31 29
+> n <- 100
- [27] 19 24 24 27 24 23 21 21 25 31 21 29 16 31 24 24 28 23 24 22 19 31 28 20 19 24
+> p <- 1/4
- [53] 27 28 24 28 27 25 27 26 29 29 26 36 29 27 16 23 30 32 22 32 26 29 29 22 18 22
+> q <- 1-p
- [79] 27 33 27 28 28 34 15 32 23 24 20 16 27 31 27 21 22 29 24 22 19 18
+# in order to clarify what we are doing
+# X~B(n,p) 일 때, 100개의 검볼을 샘플링해서
+# red gumball을 세봤더니
+> rbinom(1,n,p) # 24개 였다라는 뜻
+[1] 24
+# 아래는 이것을 1000번 (k번) 한 것
+> numbers.of.red.gumball <- rbinom(k, n, p)
+> numbers.of.red.gumball
+   [1] 18 27 27 22 23 26 23 26 25 30 27 28 32 24 26 29 22 24 18 27 33 22 27 31 29 19
+  [27] 24 24 27 24 23 21 21 25 31 21 29 16 31 24 24 28 23 24 22 19 31 28 20 19 24 27
+  [53] 28 24 28 27 25 27 26 29 29 26 36 29 27 16 23 30 32 22 32 26 29 29 22 18 22 27
+  [79] 33 27 28 28 34 15 32 23 24 20 16 27 31 27 21 22 29 24 22 19 18 20 17 24 30 27
+ [105] 23 19 17 28 37 20 18 26 30 30 34 30 25 23 26 24 20 19 25 22 29 25 25 27 19 27
+ [131] 23 22 23 26 25 25 32 25 27 32 22 32 23 30 21 25 27 17 24 21 24 26 33 20 22 26
+ [157] 28 25 30 33 27 30 26 23 39 23 31 18 26 27 34 25 28 31 35 28 29 32 27 31 28 25
+ [183] 22 23 15 22 20 26 21 22 16 23 22 31 24 27 31 21 24 26 26 22 22 34 19 30 22 28
+ [209] 25 24 29 25 25 16 27 23 25 32 18 22 25 25 24 24 21 32 20 28 29 22 23 22 25 21
+ [235] 27 22 24 29 24 22 30 22 21 17 25 23 21 27 22 22 25 22 29 24 26 32 28 20 22 22
+ [261] 27 26 22 24 31 18 27 29 28 17 27 33 23 33 25 32 26 23 19 21 20 23 15 19 23 26
+ [287] 27 28 23 24 35 27 30 23 25 24 31 23 20 22 22 26 21 22 26 28 26 23 21 13 29 27
+ [313] 21 34 28 24 19 26 27 25 23 27 25 19 29 18 28 21 27 28 28 22 22 20 20 25 27 17
+ [339] 16 27 32 23 18 28 31 29 21 27 27 30 21 25 20 25 26 30 26 21 15 29 22 21 16 25
+ [365] 25 27 26 27 28 21 27 24 25 24 39 24 28 33 20 26 24 27 20 31 27 27 20 21 31 25
+ [391] 22 22 30 34 27 23 21 25 20 24 29 19 30 27 33 22 29 30 22 29 26 24 18 26 36 26
+ [417] 23 24 22 32 33 16 24 28 24 25 29 31 28 28 29 26 24 25 28 27 24 31 25 31 33 26
+ [443] 26 24 33 28 20 23 22 23 22 30 25 25 23 27 27 23 24 28 24 28 23 22 26 30 26 27
+ [469] 21 23 23 27 26 23 25 30 25 24 22 28 18 23 18 16 27 26 18 25 27 22 20 19 27 25
+ [495] 31 27 22 21 24 24 26 23 23 29 27 23 25 20 21 21 27 25 22 29 28 21 21 24 27 24
+ [521] 28 19 14 32 27 22 24 35 26 28 28 26 25 25 19 26 24 20 19 28 25 25 24 21 30 27
+ [547] 30 20 22 26 31 26 20 20 27 25 26 18 30 20 29 16 38 26 22 29 22 30 26 19 27 24
+ [573] 29 29 25 19 23 24 24 23 25 31 18 24 33 27 25 27 29 28 24 23 24 28 20 24 30 24
+ [599] 21 20 25 24 24 30 22 26 23 25 21 21 24 27 18 20 22 30 25 23 27 26 23 23 28 18
+ [625] 29 27 25 32 26 15 22 24 21 34 23 23 18 29 23 27 28 23 37 20 17 25 11 21 28 22
+ [651] 28 25 22 25 21 18 20 27 30 24 28 23 30 31 24 23 37 19 27 32 25 27 28 29 22 26
+ [677] 26 20 22 25 24 19 27 21 32 27 31 29 24 24 29 29 25 22 34 23 18 33 18 23 24 26
+ [703] 18 20 23 30 28 26 34 17 33 30 32 30 22 28 19 19 23 23 20 23 21 31 30 20 24 23
+ [729] 23 28 26 34 27 33 31 20 25 12 25 20 20 25 27 24 29 26 22 30 26 28 28 27 23 18
+ [755] 28 22 21 27 22 26 21 22 27 24 19 27 29 37 30 27 25 30 19 22 22 28 32 22 33 26
+ [781] 20 31 23 24 24 26 24 30 17 21 20 22 20 17 24 22 24 23 23 24 23 16 16 17 23 27
+ [807] 29 26 16 21 34 19 25 25 28 32 17 22 26 23 23 24 22 22 14 30 25 33 26 25 31 28
+ [833] 30 21 19 17 19 21 16 21 26 21 29 27 31 32 19 22 24 25 25 24 23 30 21 22 19 20
+ [859] 21 20 21 28 19 26 28 26 29 28 26 21 31 32 31 22 23 25 27 26 22 27 30 24 25 23
+ [885] 27 25 24 24 30 29 26 32 29 23 24 20 26 26 22 22 19 23 33 18 27 26 28 18 26 24
+ [911] 24 26 27 17 26 23 27 25 32 20 22 23 25 25 24 28 20 19 22 20 22 24 17 19 22 17
+ [937] 19 27 27 28 29 18 24 30 26 34 26 24 25 24 29 28 29 23 24 21 24 23 23 29 19 29
+ [963] 30 33 25 30 32 23 30 27 17 20 21 24 36 21 26 30 26 25 22 21 38 21 24 21 25 21
+ [989] 32 20 29 24 19 21 32 26 27 18 21 20
 >
 </code>
-이 샘플의 평균은?
+그런데 교재는 이 이항분포를 비율로 (proportion) 생각하므로, 같은 방식으로 Red gumball의 비율로 바꿔서 보면
 <code>
-> set.seed(101)
+> # 아래처럼 n으로 (100개의 검볼이 총 숫자이므로)
-> mean(rbinom(100, 100, 1/4))
+> # 나눠주면 비율을 구할 수 있다
-[1] 25.28
+> proportions.of.rg <- numbers.of.red.gumball/n
->
+> ps.k <- proportions.of.rg
+> ps.k
+   [1] 0.18 0.27 0.27 0.22 0.23 0.26 0.23 0.26 0.25 0.30 0.27 0.28 0.32 0.24 0.26
+  [16] 0.29 0.22 0.24 0.18 0.27 0.33 0.22 0.27 0.31 0.29 0.19 0.24 0.24 0.27 0.24
+  [31] 0.23 0.21 0.21 0.25 0.31 0.21 0.29 0.16 0.31 0.24 0.24 0.28 0.23 0.24 0.22
+  [46] 0.19 0.31 0.28 0.20 0.19 0.24 0.27 0.28 0.24 0.28 0.27 0.25 0.27 0.26 0.29
+  [61] 0.29 0.26 0.36 0.29 0.27 0.16 0.23 0.30 0.32 0.22 0.32 0.26 0.29 0.29 0.22
+  [76] 0.18 0.22 0.27 0.33 0.27 0.28 0.28 0.34 0.15 0.32 0.23 0.24 0.20 0.16 0.27
+  [91] 0.31 0.27 0.21 0.22 0.29 0.24 0.22 0.19 0.18 0.20 0.17 0.24 0.30 0.27 0.23
+ [106] 0.19 0.17 0.28 0.37 0.20 0.18 0.26 0.30 0.30 0.34 0.30 0.25 0.23 0.26 0.24
+ [121] 0.20 0.19 0.25 0.22 0.29 0.25 0.25 0.27 0.19 0.27 0.23 0.22 0.23 0.26 0.25
+ [136] 0.25 0.32 0.25 0.27 0.32 0.22 0.32 0.23 0.30 0.21 0.25 0.27 0.17 0.24 0.21
+ [151] 0.24 0.26 0.33 0.20 0.22 0.26 0.28 0.25 0.30 0.33 0.27 0.30 0.26 0.23 0.39
+ [166] 0.23 0.31 0.18 0.26 0.27 0.34 0.25 0.28 0.31 0.35 0.28 0.29 0.32 0.27 0.31
+ [181] 0.28 0.25 0.22 0.23 0.15 0.22 0.20 0.26 0.21 0.22 0.16 0.23 0.22 0.31 0.24
+ [196] 0.27 0.31 0.21 0.24 0.26 0.26 0.22 0.22 0.34 0.19 0.30 0.22 0.28 0.25 0.24
+ [211] 0.29 0.25 0.25 0.16 0.27 0.23 0.25 0.32 0.18 0.22 0.25 0.25 0.24 0.24 0.21
+ [226] 0.32 0.20 0.28 0.29 0.22 0.23 0.22 0.25 0.21 0.27 0.22 0.24 0.29 0.24 0.22
+ [241] 0.30 0.22 0.21 0.17 0.25 0.23 0.21 0.27 0.22 0.22 0.25 0.22 0.29 0.24 0.26
+ [256] 0.32 0.28 0.20 0.22 0.22 0.27 0.26 0.22 0.24 0.31 0.18 0.27 0.29 0.28 0.17
+ [271] 0.27 0.33 0.23 0.33 0.25 0.32 0.26 0.23 0.19 0.21 0.20 0.23 0.15 0.19 0.23
+ [286] 0.26 0.27 0.28 0.23 0.24 0.35 0.27 0.30 0.23 0.25 0.24 0.31 0.23 0.20 0.22
+ [301] 0.22 0.26 0.21 0.22 0.26 0.28 0.26 0.23 0.21 0.13 0.29 0.27 0.21 0.34 0.28
+ [316] 0.24 0.19 0.26 0.27 0.25 0.23 0.27 0.25 0.19 0.29 0.18 0.28 0.21 0.27 0.28
+ [331] 0.28 0.22 0.22 0.20 0.20 0.25 0.27 0.17 0.16 0.27 0.32 0.23 0.18 0.28 0.31
+ [346] 0.29 0.21 0.27 0.27 0.30 0.21 0.25 0.20 0.25 0.26 0.30 0.26 0.21 0.15 0.29
+ [361] 0.22 0.21 0.16 0.25 0.25 0.27 0.26 0.27 0.28 0.21 0.27 0.24 0.25 0.24 0.39
+ [376] 0.24 0.28 0.33 0.20 0.26 0.24 0.27 0.20 0.31 0.27 0.27 0.20 0.21 0.31 0.25
+ [391] 0.22 0.22 0.30 0.34 0.27 0.23 0.21 0.25 0.20 0.24 0.29 0.19 0.30 0.27 0.33
+ [406] 0.22 0.29 0.30 0.22 0.29 0.26 0.24 0.18 0.26 0.36 0.26 0.23 0.24 0.22 0.32
+ [421] 0.33 0.16 0.24 0.28 0.24 0.25 0.29 0.31 0.28 0.28 0.29 0.26 0.24 0.25 0.28
+ [436] 0.27 0.24 0.31 0.25 0.31 0.33 0.26 0.26 0.24 0.33 0.28 0.20 0.23 0.22 0.23
+ [451] 0.22 0.30 0.25 0.25 0.23 0.27 0.27 0.23 0.24 0.28 0.24 0.28 0.23 0.22 0.26
+ [466] 0.30 0.26 0.27 0.21 0.23 0.23 0.27 0.26 0.23 0.25 0.30 0.25 0.24 0.22 0.28
+ [481] 0.18 0.23 0.18 0.16 0.27 0.26 0.18 0.25 0.27 0.22 0.20 0.19 0.27 0.25 0.31
+ [496] 0.27 0.22 0.21 0.24 0.24 0.26 0.23 0.23 0.29 0.27 0.23 0.25 0.20 0.21 0.21
+ [511] 0.27 0.25 0.22 0.29 0.28 0.21 0.21 0.24 0.27 0.24 0.28 0.19 0.14 0.32 0.27
+ [526] 0.22 0.24 0.35 0.26 0.28 0.28 0.26 0.25 0.25 0.19 0.26 0.24 0.20 0.19 0.28
+ [541] 0.25 0.25 0.24 0.21 0.30 0.27 0.30 0.20 0.22 0.26 0.31 0.26 0.20 0.20 0.27
+ [556] 0.25 0.26 0.18 0.30 0.20 0.29 0.16 0.38 0.26 0.22 0.29 0.22 0.30 0.26 0.19
+ [571] 0.27 0.24 0.29 0.29 0.25 0.19 0.23 0.24 0.24 0.23 0.25 0.31 0.18 0.24 0.33
+ [586] 0.27 0.25 0.27 0.29 0.28 0.24 0.23 0.24 0.28 0.20 0.24 0.30 0.24 0.21 0.20
+ [601] 0.25 0.24 0.24 0.30 0.22 0.26 0.23 0.25 0.21 0.21 0.24 0.27 0.18 0.20 0.22
+ [616] 0.30 0.25 0.23 0.27 0.26 0.23 0.23 0.28 0.18 0.29 0.27 0.25 0.32 0.26 0.15
+ [631] 0.22 0.24 0.21 0.34 0.23 0.23 0.18 0.29 0.23 0.27 0.28 0.23 0.37 0.20 0.17
+ [646] 0.25 0.11 0.21 0.28 0.22 0.28 0.25 0.22 0.25 0.21 0.18 0.20 0.27 0.30 0.24
+ [661] 0.28 0.23 0.30 0.31 0.24 0.23 0.37 0.19 0.27 0.32 0.25 0.27 0.28 0.29 0.22
+ [676] 0.26 0.26 0.20 0.22 0.25 0.24 0.19 0.27 0.21 0.32 0.27 0.31 0.29 0.24 0.24
+ [691] 0.29 0.29 0.25 0.22 0.34 0.23 0.18 0.33 0.18 0.23 0.24 0.26 0.18 0.20 0.23
+ [706] 0.30 0.28 0.26 0.34 0.17 0.33 0.30 0.32 0.30 0.22 0.28 0.19 0.19 0.23 0.23
+ [721] 0.20 0.23 0.21 0.31 0.30 0.20 0.24 0.23 0.23 0.28 0.26 0.34 0.27 0.33 0.31
+ [736] 0.20 0.25 0.12 0.25 0.20 0.20 0.25 0.27 0.24 0.29 0.26 0.22 0.30 0.26 0.28
+ [751] 0.28 0.27 0.23 0.18 0.28 0.22 0.21 0.27 0.22 0.26 0.21 0.22 0.27 0.24 0.19
+ [766] 0.27 0.29 0.37 0.30 0.27 0.25 0.30 0.19 0.22 0.22 0.28 0.32 0.22 0.33 0.26
+ [781] 0.20 0.31 0.23 0.24 0.24 0.26 0.24 0.30 0.17 0.21 0.20 0.22 0.20 0.17 0.24
+ [796] 0.22 0.24 0.23 0.23 0.24 0.23 0.16 0.16 0.17 0.23 0.27 0.29 0.26 0.16 0.21
+ [811] 0.34 0.19 0.25 0.25 0.28 0.32 0.17 0.22 0.26 0.23 0.23 0.24 0.22 0.22 0.14
+ [826] 0.30 0.25 0.33 0.26 0.25 0.31 0.28 0.30 0.21 0.19 0.17 0.19 0.21 0.16 0.21
+ [841] 0.26 0.21 0.29 0.27 0.31 0.32 0.19 0.22 0.24 0.25 0.25 0.24 0.23 0.30 0.21
+ [856] 0.22 0.19 0.20 0.21 0.20 0.21 0.28 0.19 0.26 0.28 0.26 0.29 0.28 0.26 0.21
+ [871] 0.31 0.32 0.31 0.22 0.23 0.25 0.27 0.26 0.22 0.27 0.30 0.24 0.25 0.23 0.27
+ [886] 0.25 0.24 0.24 0.30 0.29 0.26 0.32 0.29 0.23 0.24 0.20 0.26 0.26 0.22 0.22
+ [901] 0.19 0.23 0.33 0.18 0.27 0.26 0.28 0.18 0.26 0.24 0.24 0.26 0.27 0.17 0.26
+ [916] 0.23 0.27 0.25 0.32 0.20 0.22 0.23 0.25 0.25 0.24 0.28 0.20 0.19 0.22 0.20
+ [931] 0.22 0.24 0.17 0.19 0.22 0.17 0.19 0.27 0.27 0.28 0.29 0.18 0.24 0.30 0.26
+ [946] 0.34 0.26 0.24 0.25 0.24 0.29 0.28 0.29 0.23 0.24 0.21 0.24 0.23 0.23 0.29
+ [961] 0.19 0.29 0.30 0.33 0.25 0.30 0.32 0.23 0.30 0.27 0.17 0.20 0.21 0.24 0.36
+ [976] 0.21 0.26 0.30 0.26 0.25 0.22 0.21 0.38 0.21 0.24 0.21 0.25 0.21 0.32 0.20
+ [991] 0.29 0.24 0.19 0.21 0.32 0.26 0.27 0.18 0.21 0.20
+>
 </code>
-그런데 위의 이야기는 샘플의 숫자가 100 이 아닌 무한대라면 나타나는 평균을 말한다. 실제 무한대를 새뮬래이션해 볼 수는 없으므로 k를 1억으로 만들어 평균을 구해보면 아래와 같이 25가 된다.
+위의 비율의 기댓값을 (평균을) 구한다는 것이 교재가 하는 이야기
 <code>
-> set.seed(101)
+> mean.ps.k <- mean(ps.k)
-> mean(rbinom(100000000, 100, 1/4))
+> mean.ps.k
-[1] 25.0001
+[1] 0.24893
 >
 </code>
-위의 이야기를 visual하게 생각해보면
+위의 결과를 histogram으로 그려보면
 <code>
-set.seed(101)
+hist(ps.k)
-k <- 10000
+</code>
-n <- 100
+이는 평균이 0.25에 (p값에) 근접하는 값이 된다. 교재의 p값이 되는 것은 k가 무한대로 큰 값을 가질 때의 이야기.
-p <- 1/4
+아래는 k를 1000번이 아닌 1000000번 (백만번일 때의 이야기). 평균비율이 0.25가 된다.
-q <- 1-p
+<code>
-numbers.of.red.gumball <- rbinom(k, n, p)
+> set.seed(101)
-head(numbers.of.red.gumball)
+> k <- 1000000
-proportions.of.rg <- numbers.of.red.gumball/n
+> n <- 100
-head(proportions.of.rg)
+> p <- 1/4
-mean(proportions.of.rg)
+> q <- 1-p
-hist(proportions.of.rg)
+> numbers.of.red.gumball <- rbinom(k, n, p)
+> # 아래처럼 n으로 (100개의 검볼이 총 숫자이므로)
+> # 나눠주면 비율을 구할 수 있다
+> proportions.of.rg <- numbers.of.red.gumball/n
+> ps.k <- proportions.of.rg
+> mean.ps.k <- mean(ps.k)
+> mean.ps.k
+[1] 0.2500217
+>
 </code>
-{{:b:head_first_statistics:pasted:20241104-080847.png}}
+{{:b:head_first_statistics:pasted:20241106-081710.png}}
 ^ references  ^
@@ Line 212: / Line 346: @@
 ===== What about variance =====
+그렇다면 위의 분포에서의 분산값은 얼마가 될까? 그리고 표준편차값은 얼마가 될까?
 \begin{eqnarray*}
-Var(\text{probability of sample proportions}) & = & Var(P_{s}) \\
+\text{Variance of sample proportions} & = & Var(P_{s}) \\
 & = & Var\left(\frac{X}{n}\right) \\
 & = & \frac {Var(X)}{n^{2}} \\
 & = & \frac {npq}{n^{2}} \\
-& = & \frac {pq}{n}
+& = & \frac {pq}{n} \\
-\end{eqnarray*}
-\begin{eqnarray*}
 \text{Standard deviation of sample proportions} & = & \sqrt{\frac{pq}{n}} \\
 & = & \text{Standard error of sample proportions}
 \end{eqnarray*}
+우리는 위의 Standard deviation of sample proportions를 특별하게 standard error라고 부른다.
-이를 종합하면, Sample proportions 들에 대한 기대값과 분산은 각각 아래와 같다 (그림 참조).
+종합하면, Sample proportions 들에 대한 기대값과 분산은 각각 아래와 같다 (그림 참조).
 $$E(P_{s}) = p \qquad\qquad\qquad Var(P_{s}) = \displaystyle \frac{pq}{n}$$
@@ Line 233: / Line 366: @@
 continuity correction: $$\pm \frac{1}{2n}$$
+R에서의 simulation을 계속해서 보면
+<code>
+> # variance?
+> var.cal <- var(ps.k)
+> var.value <- (p*q)/n
+> var.cal
+[1] 0.001869001
+> var.value
+[1] 0.001875
+>
+> # standard deviation
+> sd.cal <- sqrt(var.cal)
+> sd.value <- sqrt(var.value)
+> sd.cal
+[1] 0.04323195
+> sd.value
+[1] 0.04330127
+> se <- sd.value
+> # 우리는 standard deviation of sample
+> # proportions 를 standard error라고
+> # 부른다
+>
+</code>
+위의 se는 standard deviation의 일종이므로 그 특성을 갖는다 (68, 95, 99%). 따라서 Red gumball의 비율이 1/4임을 알고 있을 때, n=100개의 gumball을 샘플링하면 (한번), red gumball의 비율은 p를 (0.25) 중심으로 위아래도 2*se 범위의 값이 나올 확률이 95%임을 안다는 것이 된다. 위에서 계산해보면;
+<code>
+# 위의 histogram 에서 mean 값은 이론적으로
+p
+# standard deviation값은
+se
+# 우리는 평균값에서 +- 2*sd.cal 구간이 95%인줄 안다.
+se2 <- se * 2
+# 즉, 아래 구간이
+lower <- p-se2
+upper <- p+se2
+lower
+upper
+hist(ps.k)
+abline(v=lower, col=2, lwd=2)
+abline(v=upper, col=2, lwd=2)
+</code>
+즉 아래의 그래프에서
+{{:b:head_first_statistics:pasted:20241106-084520.png}}
+lower: 0.1633975와 (16.33975%) upper: 0.3366025 사이에서 (33.66025%) red gumaball의 비율이 나올 확률이 95%라는 이야기.
+그렇다면 만약에 30% 이상이 red gumball일 확률은 무엇이라는 질문이라면
+우리는 X ~ B(100, 1/4)에서 도출되는
+X ~ N(p, se) 에서 P(X>_0.3)을 구하는 질문이므로
+-pnorm(0.295, p, se) 가 답이 되겠다.
+-pnorm(0.295, p, se)
+[1] 0.1493488
 ===== Exercise =====
@@ Line 434: / Line 622: @@
 </code>
+====== Recap ======
+Distribution of **Sample** <fc #ff0000>**P**</fc>roportion<fc #ff0000>**s**</fc>, <fc #ff0000>$Ps$</fc>,
+when sampling n entities (repeatedly) from a population whose proportion is p.
+\begin{eqnarray*}
+Ps & \sim & N(p,  \frac{pq}{n}) \\
+\text{hence, } \\
+\text{standard deviation of} \\
+\text{sample proportions} & = & \sqrt{\frac{pq}{n}}
+\end{eqnarray*}
+Distribution of **Sample** <fc #ff0000>Means, $\overline{X}$</fc>
+when sampling a sample whose size is n from a population whose mean is $\mu$ and variance is $\sigma^2$.
+\begin{eqnarray*}
+\overline{X} & \sim & N(\mu,  \frac{\sigma^2}{n}) \\
+\text{hence, } \\
+\text{standard deviation of} \\
+\text{sample means} & = &  \sqrt{\frac{\sigma^2}{n}} \\
+& = &  \frac{\sigma}{\sqrt{n}}
+\end{eqnarray*}