Differences

This shows you the differences between two versions of the page.

--- binomial_distribution [2025/10/11 06:28] – hkimscil
+++ binomial_distribution [2025/10/11 08:26] (current) – [e.g.,] hkimscil
@@ Line 7: / Line 7: @@
   * 각 한문제를 맞힐 확률은 1/4, 틀릴 확률은 3/4
   * 3문제를 풀면서 (3번의 시행) 각 문제를 맞힐 확률 분포를 말한다.
+  * 기하분포의 경우, 각 문제를 맞히거나 틀리거나를 고려하지 않고 계속 틀리다가 언젠가 한번 맞힘으로써 사건이 끝난다.
 {{:b:head_first_statistics:pasted:20191030-035316.png}}
@@ Line 55: / Line 56: @@
-====== Binomial Distribution ======
-  - 1번의 시행에서 특정 사건 A가 발생할 확률을 p라고 하면
-  - n번의 (독립적인) 시행에서 사건 A가 발생할 때의 확률 분포를
-  - 이항확률분포라고 한다.
 \begin{eqnarray*}
-{n \choose x} = \displaystyle \frac {n!}{x!(n-x)!}  \\
+P(X=2) & = & {{3} \choose {2}} \left(\frac{1}{6}\right)^{2} \left(\frac{5}{6}\right)^{3-2} \\
+& = & 0.0694
 \end{eqnarray*}
-**The number of successes in n independent Bernoulli trials has a binomial distribution.**
+<code>
+> dbinom(2, 3, 1/6)
+[1] 0.06944444
+>
+</code>
-이는 n 번의 독립적인 Bernoulli trials 로 볼 수 있다.
+====== Expectation and Variance of ======
-  * There are n independent trials
+Toss a fair coin once. What is the distribution of the number of heads?
-  * Each trial can result in one of two possible outcomes, labelled success and failure.
+  * A single trial
-    * success can be a bad thing -- tire blow-up.
+  * The trial can be one of two possible outcomes -- success and failure
-  * P(success) = p,
+  * P(success) = p
   * P(failure) = 1-p
-일반적으로 binomial distribution은 아래와 같이 계산된다.
+X = 0, 1 (failure and success)
+$P(X=x) = p^{x}(1-p)^{1-x}$ or
+$P(x) = p^{x}(1-p)^{1-x}$
-\begin{align*}
+참고.
-P(X=x) & = _{n}C_{x} \cdot p^{x} \cdot (1-p)^{n-x}, \;\; \text{for} \;\; x = 0, 1, 2, . . ., n. \\
+| x     | 0          | 1  |
-\text{or } & \\
+| p(x)  | q = (1-p)  | p  |
-P(X=x) & = {{n} \choose {x}} \cdot p^{x} \cdot (1-p)^{n-x}, \;\; \text{for} \;\; x = 0, 1, 2, . . ., n. \\
-\end{align*}
-A balanced dice is rolled 3 times. What is probability a 5 comes up exactly twice?
+When x = 0 (failure), $P(X = 0) = p^{0}(1-p)^{1-0} = (1-p)$ = Probability of failure
+When x = 1 (success), $P(X = 1) = p^{1}(1-p)^{0} = p $ = Probability of success
-p = 1/6
-n = 3
+This is called Bernoulli distribution.
-x = 2
+  * Bernoulli distribution expands to binomial distribution, geometric distribution, etc.
+  * Binomial distribution = The distribution of number of success in n independent Bernoulli trials.
+  * Geometric distribution = The distribution of number of trials to get the first success in independent Bernoulli trials.
+$$X \sim B(1,p)$$
 \begin{eqnarray*}
-P(X=2) & = & {{3} \choose {2}} \left(\frac{1}{6}\right)^{2} \left(\frac{5}{6}\right)^{3-2} \\
+E(X) & = & \sum{x * p(x)} \\
-& = & 0.0694
+& = & (0*q) + (1*p) \\
+& = & p
+\end{eqnarray*}
+\begin{eqnarray*}
+Var(X) & = & E((X - E(X))^{2}) \\
+& = & \sum_{x}(x-E(X))^2p(x)   \ldots \ldots \ldots E(X) = p \\
+& = & (0 - p)^{2}*q + (1 - p)^{2}*p  \\
+& = & (0^2 - 2p0 + p^2)*q + (1-2p+p^2)*p \\
+& = & p^2*(1-p) + (1-2p+p^2)*p \\
+& = & p^2 - p^3 + p - 2p^2 + p^3 \\
+& = & p - p^2 \\
+& = & p(1-p) \\
+& = & pq
 \end{eqnarray*}
+For generalization,
+$$X \sim B(n,p)$$
+\begin{eqnarray*}
+E(X) & = & E(X_{1}) + E(X_{2}) + ... + E(X_{n}) \\
+& = & n * E(X_{i}) \\
+& = & n * p
+\end{eqnarray*}
+\begin{eqnarray*}
+Var(X) & = & Var(X_{1}) + Var(X_{2}) + ... + Var(X_{n}) \\
+& = & n * Var(X_{i}) \\
+& = & n * p * q
+\end{eqnarray*}
+===== Proof of Binomial Expected Value and Variance =====
+[[:Mean and Variance of Binomial Distribution|이항분포에서의 기댓값과 분산에 대한 수학적 증명]], Mathematical proof of Binomial Distribution Expected value and Variance
+====== e.g., ======
+<WRAP box>
+In the latest round of Who Wants To Win A Swivel Chair, there are 5 questions. The probability of
+getting a successful outcome in a single trial is 0.25
+  - What’s the probability of getting exactly two questions right?
+  - What’s the probability of getting exactly three questions right?
+  - What’s the probability of getting two or three questions right?
+  - What’s the probability of getting no questions right?
+  - What are the expectation and variance?
+</WRAP>
+Ans 1.
 <code>
-> dbinom(2, 3, 1/6)
+p <- .25
-[1] 0.06944444
+q <- 1-p
+r <- 2
+n <-5
+# combinations of 5,2
+c <- choose(n,r)
+ans1 <- c*(p^r)*(q^(n-r))
+ans1    # or
+choose(n, r)*(p^r)*(q^(n-r))
+dbinom(r, n, p)
+</code>
+<code>
+> p <- .25
+> q <- 1-p
+> r <- 2
+> n <-5
+> # combinations of 5,2
+> c <- choose(n,r)
+> ans <- c*(p^r)*(q^(n-r))
+> ans
+[1] 0.2636719
+>
+> choose(n, r)*(p^r)*(q^(n-r))
+[1] 0.2636719
+>
+> dbinom(r, n, p)
+[1] 0.2636719
+>
 >
 </code>
@@ Line 101: / Line 182: @@
-\begin{eqnarray*}
-X \sim B(n, p) \\
-\end{eqnarray*}
+Ans 2.
+<code>
+p <- .25
+q <- 1-p
+r <- 3
+n <-5
+# combinations of 5,3
+c <- choose(n,r)
+ans2 <- c*(p^r)*(q^(n-r))
+ans2
+choose(n, r)*(p^r)*(q^(n-r))
+dbinom(r, n, p)
+</code>
+<code>
+> p <- .25
+> q <- 1-p
+> r <- 3
+> n <-5
+> # combinations of 5,3
+> c <- choose(n,r)
+> ans2 <- c*(p^r)*(q^(n-r))
+> ans2
+[1] 0.08789062
+>
+> choose(n,r)*(p^r)*(q^(n-r))
+[1] 0.08789062
+>
+> dbinom(r, n, p)
+[1] 0.08789063
+>
+>
+</code>
+Ans 3. 중요
+<code>
+ans1 + ans2
+dbinom(2, 5, .25) + dbinom(3, 5, .25)
+dbinom(2:3, 5, .25)
+sum(dbinom(2:3, 5, .25))
+pbinom(3, 5, .25) - pbinom(1, 5, .25)
+</code>
+<code>
+> ans1 + ans2
+[1] 0.3515625
+> dbinom(2, 5, .25) + dbinom(3, 5, .25)
+[1] 0.3515625
+> dbinom(2:3, 5, .25)
+[1] 0.26367187 0.08789063
+> sum(dbinom(2:3, 5, .25))
+[1] 0.3515625
+> pbinom(3, 5, .25) - pbinom(1, 5, .25)
+[1] 0.3515625
+>
+</code>
+Ans 4.
+<code>
+p <- .25
+q <- 1-p
+r <- 0
+n <-5
+# combinations of 5,3
+c <- choose(n,r)
+ans4 <- c*(p^r)*(q^(n-r))
+ans4
+</code>
+<code>> p <- .25
+> q <- 1-p
+> r <- 0
+> n <-5
+> # combinations of 5,3
+> c <- choose(n,r)
+> ans4 <- c*(p^r)*(q^(n-r))
+> ans4
+[1] 0.2373047
+> </code>
+Ans 5
+<code>
+p <- .25
+q <- 1-p
+n <- 5
+exp.x <- n*p
+exp.x
+</code>
+<code>> p <- .25
+> q <- 1-p
+> n <- 5
+> exp.x <- n*p
+> exp.x
+[1] 1.25</code>
+<code>
+p <- .25
+q <- 1-p
+n <- 5
+var.x <- n*p*q
+var.x
+</code>
+<code>> p <- .25
+> q <- 1-p
+> n <- 5
+> var.x <- n*p*q
+> var.x
+[1] 0.9375
+> </code>
+Q. 한 문제를 맞힐 확률은 1/4 이다. 총 여섯 문제가 있다고 할 때, 0에서 5 문제를 맞힐 확률은? dbinom을 이용해서 구하시오.
+<code>
+p <- 1/4
+q <- 1-p
+n <- 6
+pbinom(5, n, p)
+- dbinom(6, n, p)
+sum(dbinom(0:5, n, p))
+</code>
+<code>
+> p <- 1/4
+> q <- 1-p
+> n <- 6
+> pbinom(5, n, p)
+[1] 0.9997559
+> 1 - dbinom(6, n, p)
+[1] 0.9997559
+</code>
+중요 . . . .
+<code>
+# http://commres.net/wiki/mean_and_variance_of_binomial_distribution
+# ##################################################################
+#
+p <- 1/4
+q <- 1 - p
+n <- 5
+r <- 0
+all.dens <- dbinom(0:n, n, p)
+all.dens
+sum(all.dens)
+choose(5,0)*p^0*(q^(5-0))
+choose(5,1)*p^1*(q^(5-1))
+choose(5,2)*p^2*(q^(5-2))
+choose(5,3)*p^3*(q^(5-3))
+choose(5,4)*p^4*(q^(5-4))
+choose(5,5)*p^5*(q^(5-5))
+all.dens
+choose(5,0)*p^0*(q^(5-0)) +
+  choose(5,1)*p^1*(q^(5-1)) +
+  choose(5,2)*p^2*(q^(5-2)) +
+  choose(5,3)*p^3*(q^(5-3)) +
+  choose(5,4)*p^4*(q^(5-4)) +
+  choose(5,5)*p^5*(q^(5-5))
+sum(all.dens)
+#
+(p+q)^n
+# note that n = whatever, (p+q)^n = 1
+</code>
+<code>
+> # http://commres.net/wiki/mean_and_variance_of_binomial_distribution
+> # ##################################################################
+> #
+> p <- 1/4
+> q <- 1 - p
+> n <- 5
+> r <- 0
+> all.dens <- dbinom(0:n, n, p)
+> all.dens
+[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250
+[5] 0.0146484375 0.0009765625
+> sum(all.dens)
+[1] 1
+>
+> choose(5,0)*p^0*(q^(5-0))
+[1] 0.2373047
+> choose(5,1)*p^1*(q^(5-1))
+[1] 0.3955078
+> choose(5,2)*p^2*(q^(5-2))
+[1] 0.2636719
+> choose(5,3)*p^3*(q^(5-3))
+[1] 0.08789062
+> choose(5,4)*p^4*(q^(5-4))
+[1] 0.01464844
+> choose(5,5)*p^5*(q^(5-5))
+[1] 0.0009765625
+> all.dens
+[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250
+[5] 0.0146484375 0.0009765625
+>
+> choose(5,0)*p^0*(q^(5-0)) +
++   choose(5,1)*p^1*(q^(5-1)) +
++   choose(5,2)*p^2*(q^(5-2)) +
++   choose(5,3)*p^3*(q^(5-3)) +
++   choose(5,4)*p^4*(q^(5-4)) +
++   choose(5,5)*p^5*(q^(5-5))
+[1] 1
+> sum(all.dens)
+[1] 1
+> #
+> (p+q)^n
+[1] 1
+> # note that n = whatever, (p+q)^n = 1
+>
+</code>