Differences

This shows you the differences between two versions of the page.

--- b:head_first_statistics:variability_and_spread [2020/09/21 13:52] – hkimscil
+++ b:head_first_statistics:variability_and_spread [2023/09/13 08:59] (current) – [Variability and Spread] hkimscil
@@ Line 59: / Line 59: @@
 >
 >
 > sapply(data,sd)
 [1] 1.825742 1.563472 7.362065
@@ Line 87: / Line 88: @@
 아웃라이어의 (극단치의) 문제
-''a <- c(1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5}
+<code>
-b <- c(1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5, 10}''
+a <- c(1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5}
+b <- c(1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5, 10}
+</code>
 range(a) vs. range(b)
@@ Line 112: / Line 115: @@
 </code>
 ====== Percentile ======
+<WRAP info>
+How to find percentile
+  - First of all, line all your values up in ascending order.
+  - To find the position of the kth percentile out of n numbers, start off by calculating .$ k(\frac{n}{100})$
+  - If this gives you an integer, then your percentile is halfway between the value at position $ k(\frac{n}{100})$ and the next number along. Take the average of the numbers at these two positions to give you your percentile.
+  - If $ k(\frac{n}{100})$ is not an integer, then round it up. This then gives you the position of the percentile.
+</WRAP>
+<code>
+> k <- c(1:125)
+> length(k)
+[1] 125
+> k
+  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
+ [21]  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
+ [41]  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60
+ [61]  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80
+ [81]  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
+[101] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
+[121] 121 122 123 124 125
+>
+</code>
+th percentile 을 구하려면
+* ( 125 / 100) = 12.5
+이 숫자를 반올림하면 13이므로 13번째 숫자가 10번째 페센타일이 된다 (13).
+<code>
+> k <- c(1:10)
+> length(k)
+[1] 10
+> k
+ [1]  1  2  3  4  5  6  7  8  9 10
+</code>
+th percentile을 구하려면
+$ 20 * (10 /100) = 2 $ 이므로
+번째와 3번째 사이의 점수의 평균이므로, 2.5이다.
+====== Boxplot ======
+<code>
+# j <- c(6,7,7,8,9,10,10,11,11,13)
+j <- c(7,9,9,10,10,10,10,11,11,13)
+# m <- c(3,3,6,7,7,10,10,10,11,13,30)
+m <- c(3,3,6,7,8,9,9,10,11,13,30)
+median(j)
+median(m)
+</code>
+[{{hf.boxplot.ex.jpg}}]
+<code>
+boxplot(j)
+boxplot(m)
+</code>
+<code>
+boxplot(j, m)
+boxplot(j, m, horizontal = T)
+</code>
 ====== Variance ======
@@ Line 130: / Line 197: @@
   * calculation of variance (an easy way) see [[:variance#variance_cal|variance calculation]]
     * $ \displaystyle \frac{\sum(X_{i})}{N} - \mu^2$
+    * [{{variance.cal.jpg?600}}]
 [[:standard deviation]]
+====== Standard score ======
 [[:standard score]]
+$ z = \large\frac {x-\mu}{\sigma} $