b:head_first_statistics:using_the_normal_distribution
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| b:head_first_statistics:using_the_normal_distribution [2022/10/27 22:14] – [Exercise] hkimscil | b:head_first_statistics:using_the_normal_distribution [2025/10/29 11:12] (current) – [All aboard the Love Train] hkimscil | ||
|---|---|---|---|
| Line 88: | Line 88: | ||
| ===== So how do we find normal probabilities? | ===== So how do we find normal probabilities? | ||
| + | 평균이 0 이고 표준편차가 1일 Normal distribution 에서의 probabilities는 아래의 PDF 파일과 같이 구해 놓은 값이 있다 | ||
| + | (R을 이용하지 않는다면). [[https:// | ||
| + | 평균과 표준편차 값이 0, 1이 아닌 다른 값을 같는 분포는 0, 1 이 되도록 변환한 후에 probability를 구한다 (표준점수화). | ||
| + | |||
| + | |||
| {{: | {{: | ||
| Line 132: | Line 137: | ||
| z & = & \displaystyle \frac {x - \mu}{\sigma} \\ | z & = & \displaystyle \frac {x - \mu}{\sigma} \\ | ||
| & = & \frac {64-71} {4.5} \\ | & = & \frac {64-71} {4.5} \\ | ||
| - | & = & 1.56 | + | & = & - 1.56 |
| \end{eqnarray*} | \end{eqnarray*} | ||
| - | 따라서, 표준점수를 1.56을 가지고 표준점수 테이블에서 1.56보다 큰 부분의 면적을 구한것을 참조하면 된다. | + | 따라서, 표준점수를 |
| + | |||
| + | < | ||
| + | > 1 - pnorm(-1.56) | ||
| + | [1] 0.9406201 | ||
| + | > pnorm(-1.56, | ||
| + | [1] 0.9406201 | ||
| + | > pnorm(-1.56, | ||
| + | [1] 0.9406201 | ||
| + | > pnorm(64, 71, sqrt(20.25), | ||
| + | [1] 0.9400931 | ||
| + | > | ||
| + | </ | ||
| + | |||
| + | Note: 이제는 x축이 discrete 하지 않으므로 dnorm()과 같은 펑션을 써서 더할 수 없다 (할 수 있기는 하지만 간단하지 않다). | ||
| - | < | ||
| - | > scale(a) | ||
| - | [,1] | ||
| - | [1,] -1.70622042 | ||
| - | [2,] -1.67175132 | ||
| - | [3,] -1.63728222 | ||
| - | [4,] -1.60281312 | ||
| - | [5,] -1.56834402 | ||
| - | [6,] -1.53387492 | ||
| - | [7,] -1.49940582 | ||
| - | [8,] -1.46493672 | ||
| - | [9,] -1.43046762 | ||
| - | [10,] -1.39599852 | ||
| - | [11,] -1.36152943 | ||
| - | [12,] -1.32706033 | ||
| - | [13,] -1.29259123 | ||
| - | [14,] -1.25812213 | ||
| - | [15,] -1.22365303 | ||
| - | [16,] -1.18918393 | ||
| - | [17,] -1.15471483 | ||
| - | [18,] -1.12024573 | ||
| - | [19,] -1.08577663 | ||
| - | [20,] -1.05130753 | ||
| - | [21,] -1.01683843 | ||
| - | [22,] -0.98236933 | ||
| - | [23,] -0.94790023 | ||
| - | [24,] -0.91343113 | ||
| - | [25,] -0.87896203 | ||
| - | [26,] -0.84449293 | ||
| - | [27,] -0.81002384 | ||
| - | [28,] -0.77555474 | ||
| - | [29,] -0.74108564 | ||
| - | [30,] -0.70661654 | ||
| - | [31,] -0.67214744 | ||
| - | [32,] -0.63767834 | ||
| - | [33,] -0.60320924 | ||
| - | [34,] -0.56874014 | ||
| - | [35,] -0.53427104 | ||
| - | [36,] -0.49980194 | ||
| - | [37,] -0.46533284 | ||
| - | [38,] -0.43086374 | ||
| - | [39,] -0.39639464 | ||
| - | [40,] -0.36192554 | ||
| - | [41,] -0.32745644 | ||
| - | [42,] -0.29298734 | ||
| - | [43,] -0.25851825 | ||
| - | [44,] -0.22404915 | ||
| - | [45,] -0.18958005 | ||
| - | [46,] -0.15511095 | ||
| - | [47,] -0.12064185 | ||
| - | [48,] -0.08617275 | ||
| - | [49,] -0.05170365 | ||
| - | [50,] -0.01723455 | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | | ||
| - | [100, | ||
| - | attr(," | ||
| - | [1] 50.5 | ||
| - | attr(," | ||
| - | [1] 29.01149 | ||
| - | > aa <- scale(a) | ||
| - | > mean(aa) | ||
| - | [1] 0 | ||
| - | > sd(aa) | ||
| - | [1] 1 | ||
| - | > </ | ||
| ==== exercise ==== | ==== exercise ==== | ||
| <WRAP box> | <WRAP box> | ||
| - | 1. N(10, 4), value 6 | + | - N(10, 4), value 6 |
| - | 2. N(6.3, 9), value 0.3 | + | |
| - | 3. N(2, 4). If the standard score is 0.5, what’s the value? | + | |
| - | 4. The standard score of value 20 is 2. If the variance is 16, what’s the mean? | + | |
| + | </ | ||
| + | <WRAP box> | ||
| + | < | ||
| + | * 1 | ||
| + | pnorm(6, 10, sqrt(4), lower.tail = F) | ||
| + | * 2 | ||
| + | pnorm(0.3, 6.3, sqrt(9), lower.tail = F) | ||
| + | * 3 | ||
| + | 0.5 = (v - 2)/ | ||
| + | v-2 = 1 | ||
| + | v = 3 | ||
| + | * 4 | ||
| + | z = (v - mean) / sd | ||
| + | 2 = (20 - mean) / sqrt(16) | ||
| + | mean = 12 | ||
| + | </ | ||
| </ | </ | ||
| Line 288: | Line 210: | ||
| ===== Exercise ===== | ===== Exercise ===== | ||
| Julie with 5" heels = 64 + 5 = 69 | Julie with 5" heels = 64 + 5 = 69 | ||
| + | Remember X ~ N(71, 20.25) | ||
| + | mean = 71 | ||
| + | variance = 20.25 | ||
| + | sd = 4.5 | ||
| + | z = (71-69)/4.5 | ||
| z score = -0.44 | z score = -0.44 | ||
| Line 296: | Line 223: | ||
| \end{eqnarray*} | \end{eqnarray*} | ||
| - | < | + | < |
| + | > 1-pnorm(-0.44) | ||
| [1] 0.6700314 | [1] 0.6700314 | ||
| > | > | ||
| + | > pnorm(69, 71, sqrt(20.25), | ||
| + | [1] 0.6716394 | ||
| + | > | ||
| + | > z <- (69 - 71)/ sqrt(20.25) | ||
| + | > z | ||
| + | [1] -0.4444444 | ||
| + | > pnorm(z, lower.tail = F) | ||
| + | [1] 0.6716394 | ||
| + | > | ||
| + | |||
| </ | </ | ||
| Line 359: | Line 297: | ||
| < | < | ||
| + | Mean <- 100 | ||
| + | Sd <- 10 | ||
| - | x <- seq(-4,4, length=100) | + | # X grid for non-standard normal distribution |
| - | y <- dnorm(x) | + | x <- seq(-4, 4, length = 100) * Sd + Mean |
| - | plot(x,y, type=" | + | |
| + | # Density function | ||
| + | f <- dnorm(x, Mean, Sd) | ||
| + | |||
| + | plot(x, f, type = " | ||
| + | abline(v = Mean) # Vertical line on the mean | ||
| </ | </ | ||
| - | {{: | ||
| - | < | ||
| - | # Children' | ||
| - | # mean of 100 and a standard deviation of 15. What | ||
| - | # proportion of children are expected to have an IQ between | ||
| - | # 80 and 120? | ||
| - | mean=100; sd=15 | + | {{: |
| - | lb=80; ub=120 | + | |
| - | x <- seq(-4, | + | <code> |
| - | hx <- dnorm(x,mean,sd) | + | # mean: mean of the Normal variable |
| + | # sd: standard deviation of the Normal variable | ||
| + | # lb: lower bound of the area | ||
| + | # ub: upper bound of the area | ||
| + | # acolor: color of the area | ||
| + | # ...: additional arguments to be passed to lines function | ||
| - | plot(x, hx, type=" | + | normal_area <- function(mean = 0, sd = 1, lb, ub, acolor |
| - | main=" | + | x <- seq(mean - 3 * sd, mean + 3 * sd, length |
| + | |||
| + | if (missing(lb)) { | ||
| + | lb <- min(x) | ||
| + | } | ||
| + | if (missing(ub)) { | ||
| + | ub <- max(x) | ||
| + | } | ||
| - | i <- x >= lb & x <= ub | + | x2 <- seq(lb, ub, length = 100) |
| - | lines(x, hx) | + | plot(x, dnorm(x, mean, sd), type = " |
| - | polygon(c(lb, | + | |
| + | y <- dnorm(x2, mean, sd) | ||
| + | | ||
| + | lines(x, dnorm(x, mean, sd), type = "l", ...) | ||
| + | } | ||
| + | </ | ||
| - | area <- pnorm(ub, mean, sd) - pnorm(lb, mean, sd) | + | <code> |
| - | result | + | normal_area(mean = 0, sd = 1, lb = -1, ub = 2, lwd = 2) |
| - | | + | </ |
| - | mtext(result,3) | + | {{: |
| - | axis(1, at=seq(40, 160, 20), pos=0) | + | < |
| + | pnorm(2) | ||
| + | pnorm(-1) | ||
| + | pnorm(2)-pnorm(-1) | ||
| + | ar <- round(pnorm(2)-pnorm(-1),3) | ||
| + | </code> | ||
| + | < | ||
| + | > pnorm(2) | ||
| + | [1] 0.9772499 | ||
| + | > pnorm(-1) | ||
| + | [1] 0.1586553 | ||
| + | > pnorm(2)-pnorm(-1) | ||
| + | [1] 0.8185946 | ||
| + | > ar <- round(pnorm(2)-pnorm(-1),3) | ||
| + | > | ||
| + | </ | ||
| + | < | ||
| + | m.s <- 100 | ||
| + | sd.s <- 15 | ||
| + | lb <- 80 | ||
| + | ub <- 110 | ||
| + | normal_area(mean = m.s, sd = sd.s, lb = lb, ub = ub, lwd = 2) | ||
| + | ar <- round(pnorm(ub, m.s, sd.s)-pnorm(lb, m.s, sd.s),3) | ||
| + | text(m.s, .01, ar) | ||
| + | </ | ||
| + | {{: | ||
| + | < | ||
| + | m.s <- 100 | ||
| + | sd.s <- 15 | ||
| + | lb <- m.s - sd.s | ||
| + | ub <- m.s + sd.s | ||
| + | normal_area(mean = m.s, sd = sd.s, lb = lb, ub = ub, lwd = 2) | ||
| + | ar <- round(pnorm(ub, m.s, sd.s)-pnorm(lb, m.s, sd.s),3) | ||
| + | text(m.s, .01, ar) | ||
| </ | </ | ||
| - | {{: | ||
| </ | </ | ||
| ===== Headline ===== | ===== Headline ===== | ||
| Line 536: | Line 522: | ||
| </ | </ | ||
| - | < | + | < |
| pnorm in r: 표준점수에 해당하는 누적 퍼센티지 (<fc # | pnorm in r: 표준점수에 해당하는 누적 퍼센티지 (<fc # | ||
| < | < | ||
| Line 560: | Line 546: | ||
| </ | </ | ||
| - | [{{ : | + | {{: |
| - | + | ||
| - | </ | + | |
| 따라서 | 따라서 | ||
| $$P(X + Y < 380) = 0.9082409 $$ | $$P(X + Y < 380) = 0.9082409 $$ | ||
| + | </ | ||
| ===== exercise ===== | ===== exercise ===== | ||
| - | < | + | < |
| Julie’s matchmaker is at it again. What's the **probability that a man will be at least 5 inches taller than a woman**? In Statsville, the height of men in inches is distributed as N(71, 20.25), and the height of women in inches is distributed as N(64, 16). | Julie’s matchmaker is at it again. What's the **probability that a man will be at least 5 inches taller than a woman**? In Statsville, the height of men in inches is distributed as N(71, 20.25), and the height of women in inches is distributed as N(64, 16). | ||
| </ | </ | ||
| Line 578: | Line 562: | ||
| **probability that a man will be at least 5 inches taller than a woman**? = " | **probability that a man will be at least 5 inches taller than a woman**? = " | ||
| + | |||
| \begin{align*} | \begin{align*} | ||
| P(X > F + 5) & = P(X - F > 5) | P(X > F + 5) & = P(X - F > 5) | ||
| Line 615: | Line 600: | ||
| ===== Linear Transform ===== | ===== Linear Transform ===== | ||
| - | <WRAP alert 60%> | + | <WRAP alert> |
| 4인용 roller coaster의 지지하중 무게는 800 LBs 라고 한다. 그리고 Statsville 사람들의 평균 몸무게는 180, 분산은 625라고 할 수 있다. 네명을 합한 무게가 800 LBs 보다 작을 확률은 얼마나 될까? | 4인용 roller coaster의 지지하중 무게는 800 LBs 라고 한다. 그리고 Statsville 사람들의 평균 몸무게는 180, 분산은 625라고 할 수 있다. 네명을 합한 무게가 800 LBs 보다 작을 확률은 얼마나 될까? | ||
| </ | </ | ||
| Line 627: | Line 612: | ||
| {{: | {{: | ||
| + | 기억: | ||
| + | E(ax + b) = a E(x) + b | ||
| + | V(ax + b) = a^2 V(x) + 0 | ||
| + | |||
| + | |||
| ===== Independent Observation | ===== Independent Observation | ||
| Rather than transforming the weight of each adult, what we really need to figure out is <fc # | Rather than transforming the weight of each adult, what we really need to figure out is <fc # | ||
| Line 640: | Line 630: | ||
| {{: | {{: | ||
| - | < | + | < |
| Q: So what’s the difference between linear transforms and independent observations? | Q: So what’s the difference between linear transforms and independent observations? | ||
| A: Linear transforms affect the underlying values in your probability distribution. As an example, if you have a length of rope of a particular length, then applying a linear transform affects the length of the rope. Independent observations have to do with the quantity of things you’re dealing with. As an example, if you have n independent observations of a piece of rope, then you’re talking about n pieces of rope. In general, __if the quantity changes__, you’re dealing with **independent observations**. __If the underlying values change__, then you’re dealing with a **transform**. | A: Linear transforms affect the underlying values in your probability distribution. As an example, if you have a length of rope of a particular length, then applying a linear transform affects the length of the rope. Independent observations have to do with the quantity of things you’re dealing with. As an example, if you have n independent observations of a piece of rope, then you’re talking about n pieces of rope. In general, __if the quantity changes__, you’re dealing with **independent observations**. __If the underlying values change__, then you’re dealing with a **transform**. | ||
| Line 668: | Line 658: | ||
| [1] 0.9452007 | [1] 0.9452007 | ||
| # 혹은 | # 혹은 | ||
| - | > pnorm(800, 720, sqrt(2500), lower.tail = TRUE) | + | > pnorm(800, 720, sqrt(2500), |
| + | > lower.tail = TRUE) | ||
| [1] 0.9452007 | [1] 0.9452007 | ||
| </ | </ | ||
| Line 678: | Line 669: | ||
| Before going further: | Before going further: | ||
| - | < | + | < |
| So what’s the probability of getting 30 or more questions right out of 40? That will help us determine whether to keep playing, or walk away. | So what’s the probability of getting 30 or more questions right out of 40? That will help us determine whether to keep playing, or walk away. | ||
| </ | </ | ||
| - | < | + | < |
| There are 40 questions, which means there are 40 trials. | There are 40 questions, which means there are 40 trials. | ||
| Line 697: | Line 688: | ||
| </ | </ | ||
| - | < | + | < |
| < | < | ||
| > pbinom(29, | > pbinom(29, | ||
| [1] 4.630881e-11 | [1] 4.630881e-11 | ||
| + | > dbinom(30: | ||
| + | [1] 4.140329e-11 4.451967e-12 4.173719e-13 3.372702e-14 2.314599e-15 | ||
| + | [6] 1.322628e-16 6.123279e-18 2.206587e-19 5.806808e-21 9.926167e-23 | ||
| + | [11] 8.271806e-25 | ||
| + | > 1 - dbinom(0: | ||
| + | [1] 0.9999899 0.9998659 0.9991284 0.9963200 0.9886534 0.9727683 | ||
| + | [7] 0.9470494 0.9142704 0.8821219 0.8602926 0.8556357 0.8687597 | ||
| + | [13] 0.8942786 0.9240975 0.9512055 0.9718076 0.9853165 0.9930901 | ||
| + | [19] 0.9970569 0.9988641 0.9996024 0.9998738 0.9999637 0.9999905 | ||
| + | [25] 0.9999978 0.9999995 0.9999999 1.0000000 1.0000000 1.0000000 | ||
| + | > sum(dbinom(30: | ||
| + | [1] 4.630881e-11 | ||
| + | > 1 - sum(dbinom(0: | ||
| + | [1] 4.630896e-11 | ||
| + | > | ||
| + | |||
| </ | </ | ||
| Line 812: | Line 819: | ||
| - | <WRAP help 60%> | + | <WRAP help> |
| Before we use the normal distribution for the full 40 questions for Who Wants To Win A Swivel Chair, let’s tackle a simpler problem to make sure it works. Let’s try finding the probability that we get 5 or fewer questions correct out of 12, where there are only two possible choices for each question. | Before we use the normal distribution for the full 40 questions for Who Wants To Win A Swivel Chair, let’s tackle a simpler problem to make sure it works. Let’s try finding the probability that we get 5 or fewer questions correct out of 12, where there are only two possible choices for each question. | ||
| Line 822: | Line 829: | ||
| {{: | {{: | ||
| - | < | + | < |
| - | 이를 R을 이용하여 구하면, | + | 위를 R에서 해보면 |
| < | < | ||
| - | pbinom(5, 12, 1/2) | + | > dbinom(0, 12, 1/2) + dbinom(1, 12, 1/2) + dbinom(2, 12, 1/2) |
| + | > + dbinom(3, 12, 1/2) + dbinom(4, 12, 1/2) + dbinom(5, 12, 1/2) | ||
| + | [1] 0.387207 | ||
| </ | </ | ||
| + | 그러나, R에서는 더 간단한 방법으로 | ||
| < | < | ||
| > pbinom(5, 12, 1/2) | > pbinom(5, 12, 1/2) | ||
| Line 833: | Line 842: | ||
| </ | </ | ||
| + | 그리고, 위의 dbinom으로 하나씩 계산한다고 하더라도 아래처럼 하게 된다 | ||
| + | < | ||
| + | > sum(dbinom(c(0: | ||
| + | [1] 0.387207 | ||
| + | > | ||
| + | </ | ||
| </ | </ | ||
| Line 871: | Line 886: | ||
| > pnorm(-0.29) | > pnorm(-0.29) | ||
| [1] 0.3859081 | [1] 0.3859081 | ||
| + | |||
| + | # the below is the same as the above | ||
| + | > n <- 12 | ||
| + | > p <- 1/2 | ||
| + | > q <- 1-p | ||
| + | > pnorm(5.5, n*p, sqrt(n*p*q)) | ||
| + | [1] 0.386415 | ||
| + | > | ||
| </ | </ | ||
| 이 값은 위의 0.387에 근사하다. | 이 값은 위의 0.387에 근사하다. | ||
| - | < | + | < |
| * In particular circumstances you can **use the normal distribution to approximate the binomial**. If X ~ B(n, p) and np > 5 and nq > 5 then you can approximate X using X ~ N(np, npq) | * In particular circumstances you can **use the normal distribution to approximate the binomial**. If X ~ B(n, p) and np > 5 and nq > 5 then you can approximate X using X ~ N(np, npq) | ||
| * If you’re approximating the binomial distribution with the normal distribution, | * If you’re approximating the binomial distribution with the normal distribution, | ||
| Line 882: | Line 905: | ||
| {{: | {{: | ||
| - | < | + | < |
| Q:Does it really save time to approximate the binomial distribution with the normal? | Q:Does it really save time to approximate the binomial distribution with the normal? | ||
| Line 905: | Line 928: | ||
| ===== Pool Puzzle ===== | ===== Pool Puzzle ===== | ||
| <wrap # | <wrap # | ||
| - | < | + | < |
| - | X < 3 | + | X < 3 |
| - | X > 3 | + | X > 3 |
| - | X <_ 3 | + | X <_ 3 |
| - | X >_ 3 | + | X >_ 3 |
| - | 3 <_ X < 10 ---- | + | 3 <_ X < 10 <wrap spoiler> 2.5 < X < 9.5 </ |
| - | X = 0 | + | X = 0 |
| - | 3 <_ X <_ 10 | + | 3 <_ X <_ 10 |
| - | 3 < X <_ 10 | + | 3 < X <_ 10 |
| - | X > 0 | + | X > 0 |
| - | 3 < X < 10 | + | 3 < X < 10 |
| </ | </ | ||
| ===== exercise ===== | ===== exercise ===== | ||
| - | <WRAP help 60%> | + | <WRAP help> |
| What’s the probability of you winning the jackpot on today’s edition of Who Wants to Win a Swivel Chair? See if you can find the probability of getting at least 30 questions correct out of 40, where each question has a choice of 4 possible answers. | What’s the probability of you winning the jackpot on today’s edition of Who Wants to Win a Swivel Chair? See if you can find the probability of getting at least 30 questions correct out of 40, where each question has a choice of 4 possible answers. | ||
| </ | </ | ||
| Line 952: | Line 975: | ||
| {{: | {{: | ||
| - | $\lambda > 15$ 일 때, Poisson distribution, | + | <fc #ff0000>$\lambda > 15$ 일 때,</ |
| 예) | 예) | ||
| Line 966: | Line 989: | ||
| {{: | {{: | ||
| - | <WRAP help 60%> | + | <WRAP help> |
| Dexter’s found some statistics on the Internet about the model of roller coaster he’s been trying out, and according to one site, you can expect the ride to break down 40 times a year. | Dexter’s found some statistics on the Internet about the model of roller coaster he’s been trying out, and according to one site, you can expect the ride to break down 40 times a year. | ||
| Line 1001: | Line 1024: | ||
| $0.9654916 \sim 0.9656205$ | $0.9654916 \sim 0.9656205$ | ||
| + | |||
| + | R에서 ppois을 이용하면 | ||
| + | < | ||
| + | > ppois(51, 40) | ||
| + | [1] 0.9612598 | ||
| + | > | ||
| + | |||
| + | </ | ||
| ===== Check up ===== | ===== Check up ===== | ||
b/head_first_statistics/using_the_normal_distribution.1666876452.txt.gz · Last modified: by hkimscil
