Differences

This shows you the differences between two versions of the page.

--- binomial_distribution [2020/11/27 19:42] – hkimscil
+++ binomial_distribution [2025/10/11 08:26] (current) – [e.g.,] hkimscil
@@ Line 1: / Line 1: @@
-====== Binomial Distribution ======
+====== Binomial Distributions ======
-  - 1번의 시행에서 특정 사건 A가 발생할 확률을 p라고 하면
-  - n번의 (독립적인) 시행에서 사건 A가 발생할 때의 확률 분포를
-  - 이항확률분포라고 한다.
+  - 1번의 시행에서 특정 사건 A가 발생할 확률을 p라고 하면
+  - n번의 (독립적인) 시행에서 사건 A가 발생할 때의 확률 분포를
+  - **이항확률분포**라고 한다.
+아래를 보면
+  * 각 한문제를 맞힐 확률은 1/4, 틀릴 확률은 3/4
+  * 3문제를 풀면서 (3번의 시행) 각 문제를 맞힐 확률 분포를 말한다.
+  * 기하분포의 경우, 각 문제를 맞히거나 틀리거나를 고려하지 않고 계속 틀리다가 언젠가 한번 맞힘으로써 사건이 끝난다.
+{{:b:head_first_statistics:pasted:20191030-035316.png}}
+{{:b:head_first_statistics:pasted:20191030-035452.png}}
+| x  | P(X=x)                    | power of .75  | power of .25  |
+| 0  | 0.75 * 0.75 * 0.75        | 3  | 0  |
+| 1  | 3 * (0.75 * 0.75 * 0.25)  | 2  | 1  |
+| 2  | 3 * (0.75 * 0.25 * 0.25)  | 1  | 2  |
+| 3  | 0.25 * 0.25 * 0.25        | 0  | 3  |
+{{:b:head_first_statistics:pasted:20191030-040346.png}}
+$$P(X = r) = {\huge\text{?} \cdot 0.25^{r} \cdot 0.75^{3-r}} $$
+$$P(X = r) = {\huge_{3}C_{r}} \cdot 0.25^{r} \cdot 0.75^{3-r}$$
+$_{n}C_{r}$은 n개의 사물에서 r개를 (순서없이) 고르는 방법의 수라고 할 때, 3개의 질문 중에서 한 개의 정답을 맞히는 방법은 $_{3}C_{1} = 3$ 세가지가 존재.
+Probability for getting one question right
 \begin{eqnarray*}
-{n \choose x} = \displaystyle \frac {n!}{x!(n-x)!}  \\
+P(X = r) & = &  _{3}C_{1} \cdot 0.25^{1} \cdot 0.75^{3-1} \\
-\end{eqnarray*}
+& = & \frac{3!}{1! \cdot (3-1)!} \cdot 0.25 \cdot 0.75^2 \\
+& = & 3 \cdot 0.25 \cdot 0.5625 \\
+& = & 3 \cdot 0.25 \cdot 0.5625 \\
+& = & 0.421875
+\end{eqnarray*}
-**The number of successes in n independent Bernoulli trials has a binomial distribution.**
+$$P(X = r) = _{n}C_{r} \cdot 0.25^{r} \cdot 0.75^{n-r}$$
+$$P(X = r) = _{n}C_{r} \cdot p^{r} \cdot q^{n-r}$$
-이는 n 번의 독립적인 Bernoulli trials 로 볼 수 있다.
+  - You’re running a series of independent trials. (n번의 시행을 하게 된다)
-  * There are n independent trials
+  - There can be either a success or failure for each trial, and the probability of success is the same for each trial. (각 시행은 성공/실패로 구분되고 성공의 확률은 (반대로 실패의 확률도) 각 시행마다 동일하다)
-  * Each trial can result in one of two possible outcomes, labelled success and failure.
+  - There are a finite number of trials. Note that this is different from that of geometric distribution. (n번의 시행으로 한정된다. 무한대 시행이 아님)
-    * success can be a bad thing -- tire blow-up.
-  * P(success) = p,
-  * P(failure) = 1-p
-일반적으로 binomial distribution은 아래와 같이 계산된다.
+X가 n번의 시행에서 성공적인 결과를 얻는 수를 나타낸다고 할 때, r번의 성공이 있을 확률을 구하려면 아래 공식을 이용한다.
-\begin{align*}
+\begin{eqnarray*}
-P(X=x) & = _{n}C_{x} \cdot p^{x} \cdot (1-p)^{n-x}, \;\; \text{for} \;\; x = 0, 1, 2, . . ., n. \\
+P(X = r) & = & _{n}C_{r} \cdot p^{r} \cdot q^{n-r} \;\;\; \text{Where,} \\
-\text{or } & \\
+_{n}C_{r} & = & \frac {n!}{r!(n-r)!}
-P(X=x) & = {{n} \choose {x}} \cdot p^{x} \cdot (1-p)^{n-x}, \;\; \text{for} \;\; x = 0, 1, 2, . . ., n. \\
+\end{eqnarray*}
-\end{align*}
+p = 각 시행에서 성공할 확률
+n = 시행 숫자
+r = r 개의 정답을 구할 확률
+$$X \sim B(n,p)$$
-A balanced dice is rolled 3 times. What is probability a 5 comes up exactly twice?
-p = 1/6
-n = 3
-x = 2
 \begin{eqnarray*}
@@ Line 42: / Line 68: @@
 </code>
+====== Expectation and Variance of ======
+Toss a fair coin once. What is the distribution of the number of heads?
+  * A single trial
+  * The trial can be one of two possible outcomes -- success and failure
+  * P(success) = p
+  * P(failure) = 1-p
+X = 0, 1 (failure and success)
+$P(X=x) = p^{x}(1-p)^{1-x}$ or
+$P(x) = p^{x}(1-p)^{1-x}$
+참고.
+| x     | 0          | 1  |
+| p(x)  | q = (1-p)  | p  |
+When x = 0 (failure), $P(X = 0) = p^{0}(1-p)^{1-0} = (1-p)$ = Probability of failure
+When x = 1 (success), $P(X = 1) = p^{1}(1-p)^{0} = p $ = Probability of success
+This is called Bernoulli distribution.
+  * Bernoulli distribution expands to binomial distribution, geometric distribution, etc.
+  * Binomial distribution = The distribution of number of success in n independent Bernoulli trials.
+  * Geometric distribution = The distribution of number of trials to get the first success in independent Bernoulli trials.
+$$X \sim B(1,p)$$
 \begin{eqnarray*}
-X \sim B(n, p) \\
+E(X) & = & \sum{x * p(x)} \\
+& = & (0*q) + (1*p) \\
+& = & p
+\end{eqnarray*}
+\begin{eqnarray*}
+Var(X) & = & E((X - E(X))^{2}) \\
+& = & \sum_{x}(x-E(X))^2p(x)   \ldots \ldots \ldots E(X) = p \\
+& = & (0 - p)^{2}*q + (1 - p)^{2}*p  \\
+& = & (0^2 - 2p0 + p^2)*q + (1-2p+p^2)*p \\
+& = & p^2*(1-p) + (1-2p+p^2)*p \\
+& = & p^2 - p^3 + p - 2p^2 + p^3 \\
+& = & p - p^2 \\
+& = & p(1-p) \\
+& = & pq
 \end{eqnarray*}
+For generalization,
+$$X \sim B(n,p)$$
+\begin{eqnarray*}
+E(X) & = & E(X_{1}) + E(X_{2}) + ... + E(X_{n}) \\
+& = & n * E(X_{i}) \\
+& = & n * p
+\end{eqnarray*}
+\begin{eqnarray*}
+Var(X) & = & Var(X_{1}) + Var(X_{2}) + ... + Var(X_{n}) \\
+& = & n * Var(X_{i}) \\
+& = & n * p * q
+\end{eqnarray*}
+===== Proof of Binomial Expected Value and Variance =====
+[[:Mean and Variance of Binomial Distribution|이항분포에서의 기댓값과 분산에 대한 수학적 증명]], Mathematical proof of Binomial Distribution Expected value and Variance
+====== e.g., ======
+<WRAP box>
+In the latest round of Who Wants To Win A Swivel Chair, there are 5 questions. The probability of
+getting a successful outcome in a single trial is 0.25
+  - What’s the probability of getting exactly two questions right?
+  - What’s the probability of getting exactly three questions right?
+  - What’s the probability of getting two or three questions right?
+  - What’s the probability of getting no questions right?
+  - What are the expectation and variance?
+</WRAP>
+Ans 1.
+<code>
+p <- .25
+q <- 1-p
+r <- 2
+n <-5
+# combinations of 5,2
+c <- choose(n,r)
+ans1 <- c*(p^r)*(q^(n-r))
+ans1    # or
+choose(n, r)*(p^r)*(q^(n-r))
+dbinom(r, n, p)
+</code>
+<code>
+> p <- .25
+> q <- 1-p
+> r <- 2
+> n <-5
+> # combinations of 5,2
+> c <- choose(n,r)
+> ans <- c*(p^r)*(q^(n-r))
+> ans
+[1] 0.2636719
+>
+> choose(n, r)*(p^r)*(q^(n-r))
+[1] 0.2636719
+>
+> dbinom(r, n, p)
+[1] 0.2636719
+>
+>
+</code>
+Ans 2.
+<code>
+p <- .25
+q <- 1-p
+r <- 3
+n <-5
+# combinations of 5,3
+c <- choose(n,r)
+ans2 <- c*(p^r)*(q^(n-r))
+ans2
+choose(n, r)*(p^r)*(q^(n-r))
+dbinom(r, n, p)
+</code>
+<code>
+> p <- .25
+> q <- 1-p
+> r <- 3
+> n <-5
+> # combinations of 5,3
+> c <- choose(n,r)
+> ans2 <- c*(p^r)*(q^(n-r))
+> ans2
+[1] 0.08789062
+>
+> choose(n,r)*(p^r)*(q^(n-r))
+[1] 0.08789062
+>
+> dbinom(r, n, p)
+[1] 0.08789063
+>
+>
+</code>
+Ans 3. 중요
+<code>
+ans1 + ans2
+dbinom(2, 5, .25) + dbinom(3, 5, .25)
+dbinom(2:3, 5, .25)
+sum(dbinom(2:3, 5, .25))
+pbinom(3, 5, .25) - pbinom(1, 5, .25)
+</code>
+<code>
+> ans1 + ans2
+[1] 0.3515625
+> dbinom(2, 5, .25) + dbinom(3, 5, .25)
+[1] 0.3515625
+> dbinom(2:3, 5, .25)
+[1] 0.26367187 0.08789063
+> sum(dbinom(2:3, 5, .25))
+[1] 0.3515625
+> pbinom(3, 5, .25) - pbinom(1, 5, .25)
+[1] 0.3515625
+>
+</code>
+Ans 4.
+<code>
+p <- .25
+q <- 1-p
+r <- 0
+n <-5
+# combinations of 5,3
+c <- choose(n,r)
+ans4 <- c*(p^r)*(q^(n-r))
+ans4
+</code>
+<code>> p <- .25
+> q <- 1-p
+> r <- 0
+> n <-5
+> # combinations of 5,3
+> c <- choose(n,r)
+> ans4 <- c*(p^r)*(q^(n-r))
+> ans4
+[1] 0.2373047
+> </code>
+Ans 5
+<code>
+p <- .25
+q <- 1-p
+n <- 5
+exp.x <- n*p
+exp.x
+</code>
+<code>> p <- .25
+> q <- 1-p
+> n <- 5
+> exp.x <- n*p
+> exp.x
+[1] 1.25</code>
+<code>
+p <- .25
+q <- 1-p
+n <- 5
+var.x <- n*p*q
+var.x
+</code>
+<code>> p <- .25
+> q <- 1-p
+> n <- 5
+> var.x <- n*p*q
+> var.x
+[1] 0.9375
+> </code>
+Q. 한 문제를 맞힐 확률은 1/4 이다. 총 여섯 문제가 있다고 할 때, 0에서 5 문제를 맞힐 확률은? dbinom을 이용해서 구하시오.
+<code>
+p <- 1/4
+q <- 1-p
+n <- 6
+pbinom(5, n, p)
+- dbinom(6, n, p)
+sum(dbinom(0:5, n, p))
+</code>
+<code>
+> p <- 1/4
+> q <- 1-p
+> n <- 6
+> pbinom(5, n, p)
+[1] 0.9997559
+> 1 - dbinom(6, n, p)
+[1] 0.9997559
+</code>
+중요 . . . .
+<code>
+# http://commres.net/wiki/mean_and_variance_of_binomial_distribution
+# ##################################################################
+#
+p <- 1/4
+q <- 1 - p
+n <- 5
+r <- 0
+all.dens <- dbinom(0:n, n, p)
+all.dens
+sum(all.dens)
+choose(5,0)*p^0*(q^(5-0))
+choose(5,1)*p^1*(q^(5-1))
+choose(5,2)*p^2*(q^(5-2))
+choose(5,3)*p^3*(q^(5-3))
+choose(5,4)*p^4*(q^(5-4))
+choose(5,5)*p^5*(q^(5-5))
+all.dens
+choose(5,0)*p^0*(q^(5-0)) +
+  choose(5,1)*p^1*(q^(5-1)) +
+  choose(5,2)*p^2*(q^(5-2)) +
+  choose(5,3)*p^3*(q^(5-3)) +
+  choose(5,4)*p^4*(q^(5-4)) +
+  choose(5,5)*p^5*(q^(5-5))
+sum(all.dens)
+#
+(p+q)^n
+# note that n = whatever, (p+q)^n = 1
+</code>
+<code>
+> # http://commres.net/wiki/mean_and_variance_of_binomial_distribution
+> # ##################################################################
+> #
+> p <- 1/4
+> q <- 1 - p
+> n <- 5
+> r <- 0
+> all.dens <- dbinom(0:n, n, p)
+> all.dens
+[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250
+[5] 0.0146484375 0.0009765625
+> sum(all.dens)
+[1] 1
+>
+> choose(5,0)*p^0*(q^(5-0))
+[1] 0.2373047
+> choose(5,1)*p^1*(q^(5-1))
+[1] 0.3955078
+> choose(5,2)*p^2*(q^(5-2))
+[1] 0.2636719
+> choose(5,3)*p^3*(q^(5-3))
+[1] 0.08789062
+> choose(5,4)*p^4*(q^(5-4))
+[1] 0.01464844
+> choose(5,5)*p^5*(q^(5-5))
+[1] 0.0009765625
+> all.dens
+[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250
+[5] 0.0146484375 0.0009765625
+>
+> choose(5,0)*p^0*(q^(5-0)) +
++   choose(5,1)*p^1*(q^(5-1)) +
++   choose(5,2)*p^2*(q^(5-2)) +
++   choose(5,3)*p^3*(q^(5-3)) +
++   choose(5,4)*p^4*(q^(5-4)) +
++   choose(5,5)*p^5*(q^(5-5))
+[1] 1
+> sum(all.dens)
+[1] 1
+> #
+> (p+q)^n
+[1] 1
+> # note that n = whatever, (p+q)^n = 1
+>
+</code>