Differences

This shows you the differences between two versions of the page.

--- b:head_first_statistics:estimating_populations_and_samples [2024/11/06 08:35] – [What about variance] hkimscil
+++ b:head_first_statistics:estimating_populations_and_samples [2024/11/11 08:23] (current) – [Recap] hkimscil
@@ Line 393: / Line 393: @@
 <code>
-> sd.value
+# 위의 histogram 에서 mean 값은 이론적으로
-[1] 0.04330127
+p
-> se <- sd.value
+# standard deviation값은
-> se2 <- se*2
+se
-> se2
-[1] 0.08660254
+# 우리는 평균값에서 +- 2*sd.cal 구간이 95%인줄 안다.
-> p-se2
+se2 <- se * 2
-[1] 0.1633975
+# 즉, 아래 구간이
-> p+se2
+lower <- p-se2
-[1] 0.3366025
+upper <- p+se2
->
+lower
+upper
+hist(ps.k)
+abline(v=lower, col=2, lwd=2)
+abline(v=upper, col=2, lwd=2)
 </code>
+즉 아래의 그래프에서
+{{:b:head_first_statistics:pasted:20241106-084520.png}}
+lower: 0.1633975와 (16.33975%) upper: 0.3366025 사이에서 (33.66025%) red gumaball의 비율이 나올 확률이 95%라는 이야기.
+그렇다면 만약에 30% 이상이 red gumball일 확률은 무엇이라는 질문이라면
+우리는 X ~ B(100, 1/4)에서 도출되는
+X ~ N(p, se) 에서 P(X>_0.3)을 구하는 질문이므로
+-pnorm(0.295, p, se) 가 답이 되겠다.
+-pnorm(0.295, p, se)
+[1] 0.1493488
 ===== Exercise =====
 <WRAP info 60%>
@@ Line 605: / Line 622: @@
 </code>
+====== Recap ======
+Distribution of **Sample** <fc #ff0000>**P**</fc>roportion<fc #ff0000>**s**</fc>, <fc #ff0000>$Ps$</fc>,
+when sampling n entities (repeatedly) from a population whose proportion is p.
+\begin{eqnarray*}
+Ps & \sim & N(p,  \frac{pq}{n}) \\
+\text{hence, } \\
+\text{standard deviation of} \\
+\text{sample proportions} & = & \sqrt{\frac{pq}{n}}
+\end{eqnarray*}
+Distribution of **Sample** <fc #ff0000>Means, $\overline{X}$</fc>
+when sampling a sample whose size is n from a population whose mean is $\mu$ and variance is $\sigma^2$.
+\begin{eqnarray*}
+\overline{X} & \sim & N(\mu,  \frac{\sigma^2}{n}) \\
+\text{hence, } \\
+\text{standard deviation of} \\
+\text{sample means} & = &  \sqrt{\frac{\sigma^2}{n}} \\
+& = &  \frac{\sigma}{\sqrt{n}}
+\end{eqnarray*}