User Tools

Site Tools


logistic_regression

This is an old revision of the document!


Logistic Regression

Data preparation

  • Extract the .RData file NSDUH_2019.RData from the .zip file.
  • Download the R script files NSDUH_2019 Process.R and NSDUH_2019 MI Simulation.R from RMPH Resources.
  • Run the R script file NSDUH_2019 Process.R to process the raw data and create the following teaching datasets:
    • nsduh2019_rmph.RData
    • nsduh2019_adult_sub_rmph.RData
  • Place these .Rdata files in your “Data” folder.
  • Run the R script file NSDUH_2019 MI Simulation.R to process the raw data and create the following teaching datasets:
    • nsduh_mar_rmph.RData
  • Place this .Rdata file in your “Data” folder.
getwd()
# 보통 C:/Users/Username/Documents/ 
setwd("~/rData")

Odds

Odds (승산): 한 사건이 일어날 확률과 그 반대의 확률 (일어나지 않을 확률) 간의 비율 한 사건이 일어날 확률과 다름에 주의하라.
\begin{align} \displaystyle \frac {p} {1-p} \end{align}

  • 한 사건이 일어날 확률이 75%라고 하면
  • 그 사건이 일어나지 않을 확률은 25%이므로
  • 그 사건의 승산은 (odds) $ 75\%/25\% = 3:1 $
  • 3대1 (혹은 3) 이라고 읽는다
  • 이에 반해서 그 사건이 일어날 확률은 (애초에) 75% 라고 했음
  • 내가 파티에 가서 입구에서 당첨번호를 받았다
  • 당첨번호를 받은 사람은 나를 제외하고 4명이 더 있다
  • 내가 상품에 당첨이 될 확률은 $1/5=20\%$ 이다
  • 그러나 내가 상품에 당첨이 될 odds는 (승산은?)
  • 1 대 4 이다 $(1:4, (25\%))$
  • 한 사건의 probability 가 0.5보다 크다면, 그 사건이 일어날 승산은 (odds) 1보다 크다
  • 한 사건의 prob가 0.5보다 크다면 == 한사건의 일어날 odds가 내게 유리하다면
    • $odds = \frac {p}{1-p} = \frac {0.6}{1-0.6} = 1.5 $
  • 반대로 0.5보다 작으면 승산은 1보다 작게 된다.
  • $odds = \frac {p}{1-p} = \frac {0.4}{1-0.4} = 0.667 $
  • 위의 설명은
    • odds의 분포는 1을 중심으로 0-1에는 내게 불리한 odds가 나타나고
    • 1-무한대 에서는 유리한 odds가 나타나게 된다.
  • 양쪽이 언발란스한데, 이것을 없애는 방법에 log를 붙이는 것이 있다
  • odds, 1/6 와 odds 6/1에 log를 붙이면
  •   
    > log(1/6)
    [1] -1.791759
    > log(6)
    [1] 1.791759
    > 

Odds ratio

Odds ratio (승비): Odds ratio는 두 odds 간의 비율을 말한다.

  • 질병 X에 걸리 확률이 남자는 $35\%$ 이고 여자는 $25\%$라고 하면
  • 남자가 질병에 걸릴 승산은 $\frac {.35} {(1 - .35)} = .54$ 이고
  • 여자가 질병에 걸릴 승산은 $\frac {.25} {(1 - .25)} - .33$ 이다.
  • 그리고 odds ratio는 $ \frac {.54} {.33} = 1.64$ 이다.
  • wald test
##########
# see youtube 
# https://youtu.be/8nm0G-1uJzA
n.mut <- 23+117
n.norm <- 6+210
p.cancer.mut <- 23/(23+117)
p.cancer.norm <- 6/(6+210)

set.seed(1011)
c <- runif(n.mut, 0, 1)
# 0 = not cancer, 1 = cancer among mutant gene
mutant <- ifelse(c>=p.cancer.mut, 0, 1)

c <- runif(n.norm, 0, 1)
# 0 = not cancer, 1 = cancer among normal gene
normal <- ifelse(c>=p.cancer.norm, 0, 1)

# 0 = mutant; 1 = normal
gene <- c(rep(0, length(mutant)), rep(1, length(normal)))
# 0 = not cancer; 1 = cancer
cancer <- c(mutant, normal)

df <- as.data.frame(cbind(gene, cancer))
df
df$gene <- factor(df$gene, levels = c(0,1), labels = c("mutant", "norm"))
df$cancer <- factor(df$cancer, levels = c(0,1), labels = c("nocancer", "cancer"))
df
tab <- table(df)
tab
tab[1,2]
tab[1,1]

# p.c.m = p.cancer.mut the above
p.cancer.mutant <- tab[1,2]/(tab[1,1]+tab[1,2])
p.nocancer.mutant <- tab[1,1]/(tab[1,1]+tab[1,2])
p.cancer.mutant
1-p.cancer.mutant
p.nocancer.mutant

p.cancer.norm <-  tab[2,2]/(tab[2,1]+tab[2,2])
p.nocancer.norm <- 1-p.cancer.norm
p.cancer.norm
p.nocancer.norm

odds(p.cancer.mutant)
odds(p.cancer.norm)
odds.ratio(p.cancer.mutant, p.cancer.norm)
> ##########
> # see youtube 
> # https://youtu.be/8nm0G-1uJzA
> n.mut <- 23+117
> n.norm <- 6+210
> p.cancer.mut <- 23/(23+117)
> p.cancer.norm <- 6/(6+210)
> 
> set.seed(1011)
> c <- runif(n.mut, 0, 1)
> # 0 = not cancer, 1 = cancer among mutant gene
> mutant <- ifelse(c>=p.cancer.mut, 0, 1)
> 
> c <- runif(n.norm, 0, 1)
> # 0 = not cancer, 1 = cancer among normal gene
> normal <- ifelse(c>=p.cancer.norm, 0, 1)
> 
> # 0 = mutant; 1 = normal
> gene <- c(rep(0, length(mutant)), rep(1, length(normal)))
> # 0 = not cancer; 1 = cancer
> cancer <- c(mutant, normal)
> 
> df <- as.data.frame(cbind(gene, cancer))
> df
    gene cancer
1      0      0
2      0      1
3      0      0
4      0      0
5      0      0
6      0      0
7      0      0
8      0      1
9      0      0
10     0      0
11     0      0
12     0      0
13     0      0
14     0      0
15     0      0
16     0      0
17     0      0
18     0      0
19     0      0
20     0      0
21     0      0
22     0      0
23     0      0
24     0      0
25     0      0
26     0      0
27     0      0
28     0      0
29     0      1
30     0      1
31     0      0
32     0      0
33     0      0
34     0      0
35     0      0
36     0      0
37     0      1
38     0      0
39     0      0
40     0      0
41     0      0
42     0      0
43     0      0
44     0      0
45     0      0
46     0      1
47     0      0
48     0      0
49     0      0
50     0      0
51     0      1
52     0      0
53     0      0
54     0      0
55     0      0
56     0      0
57     0      0
58     0      0
59     0      1
60     0      1
61     0      0
62     0      0
63     0      0
64     0      0
65     0      0
66     0      0
67     0      0
68     0      0
69     0      0
70     0      0
71     0      0
72     0      0
73     0      0
74     0      0
75     0      0
76     0      0
77     0      0
78     0      0
79     0      0
80     0      0
81     0      0
82     0      0
83     0      0
84     0      0
85     0      0
86     0      0
87     0      0
88     0      1
89     0      0
90     0      0
91     0      1
92     0      1
93     0      0
94     0      0
95     0      0
96     0      0
97     0      0
98     0      0
99     0      0
100    0      0
101    0      1
102    0      0
103    0      0
104    0      0
105    0      0
106    0      0
107    0      1
108    0      0
109    0      0
110    0      0
111    0      0
112    0      0
113    0      0
114    0      1
115    0      1
116    0      0
117    0      0
118    0      0
119    0      0
120    0      0
121    0      0
122    0      0
123    0      0
124    0      0
125    0      0
126    0      0
127    0      0
128    0      0
129    0      0
130    0      0
131    0      1
132    0      1
133    0      1
134    0      0
135    0      0
136    0      0
137    0      0
138    0      0
139    0      0
140    0      0
141    1      0
142    1      0
143    1      0
144    1      0
145    1      0
146    1      0
147    1      0
148    1      0
149    1      0
150    1      0
151    1      0
152    1      0
153    1      0
154    1      0
155    1      0
156    1      0
157    1      0
158    1      0
159    1      0
160    1      0
161    1      0
162    1      0
163    1      0
164    1      0
165    1      0
166    1      0
167    1      0
168    1      0
169    1      0
170    1      0
171    1      0
172    1      0
173    1      0
174    1      0
175    1      0
176    1      0
177    1      0
178    1      0
179    1      0
180    1      0
181    1      0
182    1      0
183    1      0
184    1      0
185    1      0
186    1      0
187    1      0
188    1      0
189    1      0
190    1      0
191    1      0
192    1      0
193    1      0
194    1      0
195    1      0
196    1      0
197    1      0
198    1      0
199    1      0
200    1      0
201    1      0
202    1      0
203    1      1
204    1      0
205    1      0
206    1      0
207    1      0
208    1      0
209    1      0
210    1      0
211    1      0
212    1      0
213    1      0
214    1      0
215    1      0
216    1      0
217    1      0
218    1      0
219    1      0
220    1      0
221    1      0
222    1      0
223    1      0
224    1      0
225    1      0
226    1      1
227    1      0
228    1      0
229    1      1
230    1      0
231    1      0
232    1      0
233    1      0
234    1      0
235    1      0
236    1      0
237    1      0
238    1      0
239    1      0
240    1      0
241    1      0
242    1      1
243    1      0
244    1      0
245    1      0
246    1      0
247    1      0
248    1      0
249    1      0
250    1      0
251    1      0
252    1      0
253    1      0
254    1      0
255    1      0
256    1      0
257    1      0
258    1      0
259    1      0
260    1      0
261    1      0
262    1      0
263    1      0
264    1      0
265    1      0
266    1      0
267    1      0
268    1      0
269    1      0
270    1      0
271    1      0
272    1      0
273    1      0
274    1      0
275    1      0
276    1      0
277    1      0
278    1      0
279    1      0
280    1      0
281    1      0
282    1      0
283    1      0
284    1      0
285    1      0
286    1      0
287    1      0
288    1      0
289    1      0
290    1      0
291    1      0
292    1      0
293    1      0
294    1      0
295    1      0
296    1      0
297    1      0
298    1      0
299    1      0
300    1      0
301    1      0
302    1      0
303    1      0
304    1      0
305    1      1
306    1      0
307    1      0
308    1      0
309    1      0
310    1      0
311    1      0
312    1      0
313    1      0
314    1      0
315    1      0
316    1      0
317    1      0
318    1      0
319    1      0
320    1      0
321    1      0
322    1      0
323    1      0
324    1      0
325    1      0
326    1      0
327    1      0
328    1      1
329    1      0
330    1      0
331    1      0
332    1      0
333    1      0
334    1      0
335    1      0
336    1      0
337    1      0
338    1      0
339    1      0
340    1      0
341    1      0
342    1      0
343    1      0
344    1      0
345    1      0
346    1      0
347    1      0
348    1      0
349    1      0
350    1      0
351    1      0
352    1      0
353    1      0
354    1      0
355    1      0
356    1      0
> df$gene <- factor(df$gene, levels = c(0,1), labels = c("mutant", "norm"))
> df$cancer <- factor(df$cancer, levels = c(0,1), labels = c("nocancer", "cancer"))
> df
      gene   cancer
1   mutant nocancer
2   mutant   cancer
3   mutant nocancer
4   mutant nocancer
5   mutant nocancer
6   mutant nocancer
7   mutant nocancer
8   mutant   cancer
9   mutant nocancer
10  mutant nocancer
11  mutant nocancer
12  mutant nocancer
13  mutant nocancer
14  mutant nocancer
15  mutant nocancer
16  mutant nocancer
17  mutant nocancer
18  mutant nocancer
19  mutant nocancer
20  mutant nocancer
21  mutant nocancer
22  mutant nocancer
23  mutant nocancer
24  mutant nocancer
25  mutant nocancer
26  mutant nocancer
27  mutant nocancer
28  mutant nocancer
29  mutant   cancer
30  mutant   cancer
31  mutant nocancer
32  mutant nocancer
33  mutant nocancer
34  mutant nocancer
35  mutant nocancer
36  mutant nocancer
37  mutant   cancer
38  mutant nocancer
39  mutant nocancer
40  mutant nocancer
41  mutant nocancer
42  mutant nocancer
43  mutant nocancer
44  mutant nocancer
45  mutant nocancer
46  mutant   cancer
47  mutant nocancer
48  mutant nocancer
49  mutant nocancer
50  mutant nocancer
51  mutant   cancer
52  mutant nocancer
53  mutant nocancer
54  mutant nocancer
55  mutant nocancer
56  mutant nocancer
57  mutant nocancer
58  mutant nocancer
59  mutant   cancer
60  mutant   cancer
61  mutant nocancer
62  mutant nocancer
63  mutant nocancer
64  mutant nocancer
65  mutant nocancer
66  mutant nocancer
67  mutant nocancer
68  mutant nocancer
69  mutant nocancer
70  mutant nocancer
71  mutant nocancer
72  mutant nocancer
73  mutant nocancer
74  mutant nocancer
75  mutant nocancer
76  mutant nocancer
77  mutant nocancer
78  mutant nocancer
79  mutant nocancer
80  mutant nocancer
81  mutant nocancer
82  mutant nocancer
83  mutant nocancer
84  mutant nocancer
85  mutant nocancer
86  mutant nocancer
87  mutant nocancer
88  mutant   cancer
89  mutant nocancer
90  mutant nocancer
91  mutant   cancer
92  mutant   cancer
93  mutant nocancer
94  mutant nocancer
95  mutant nocancer
96  mutant nocancer
97  mutant nocancer
98  mutant nocancer
99  mutant nocancer
100 mutant nocancer
101 mutant   cancer
102 mutant nocancer
103 mutant nocancer
104 mutant nocancer
105 mutant nocancer
106 mutant nocancer
107 mutant   cancer
108 mutant nocancer
109 mutant nocancer
110 mutant nocancer
111 mutant nocancer
112 mutant nocancer
113 mutant nocancer
114 mutant   cancer
115 mutant   cancer
116 mutant nocancer
117 mutant nocancer
118 mutant nocancer
119 mutant nocancer
120 mutant nocancer
121 mutant nocancer
122 mutant nocancer
123 mutant nocancer
124 mutant nocancer
125 mutant nocancer
126 mutant nocancer
127 mutant nocancer
128 mutant nocancer
129 mutant nocancer
130 mutant nocancer
131 mutant   cancer
132 mutant   cancer
133 mutant   cancer
134 mutant nocancer
135 mutant nocancer
136 mutant nocancer
137 mutant nocancer
138 mutant nocancer
139 mutant nocancer
140 mutant nocancer
141   norm nocancer
142   norm nocancer
143   norm nocancer
144   norm nocancer
145   norm nocancer
146   norm nocancer
147   norm nocancer
148   norm nocancer
149   norm nocancer
150   norm nocancer
151   norm nocancer
152   norm nocancer
153   norm nocancer
154   norm nocancer
155   norm nocancer
156   norm nocancer
157   norm nocancer
158   norm nocancer
159   norm nocancer
160   norm nocancer
161   norm nocancer
162   norm nocancer
163   norm nocancer
164   norm nocancer
165   norm nocancer
166   norm nocancer
167   norm nocancer
168   norm nocancer
169   norm nocancer
170   norm nocancer
171   norm nocancer
172   norm nocancer
173   norm nocancer
174   norm nocancer
175   norm nocancer
176   norm nocancer
177   norm nocancer
178   norm nocancer
179   norm nocancer
180   norm nocancer
181   norm nocancer
182   norm nocancer
183   norm nocancer
184   norm nocancer
185   norm nocancer
186   norm nocancer
187   norm nocancer
188   norm nocancer
189   norm nocancer
190   norm nocancer
191   norm nocancer
192   norm nocancer
193   norm nocancer
194   norm nocancer
195   norm nocancer
196   norm nocancer
197   norm nocancer
198   norm nocancer
199   norm nocancer
200   norm nocancer
201   norm nocancer
202   norm nocancer
203   norm   cancer
204   norm nocancer
205   norm nocancer
206   norm nocancer
207   norm nocancer
208   norm nocancer
209   norm nocancer
210   norm nocancer
211   norm nocancer
212   norm nocancer
213   norm nocancer
214   norm nocancer
215   norm nocancer
216   norm nocancer
217   norm nocancer
218   norm nocancer
219   norm nocancer
220   norm nocancer
221   norm nocancer
222   norm nocancer
223   norm nocancer
224   norm nocancer
225   norm nocancer
226   norm   cancer
227   norm nocancer
228   norm nocancer
229   norm   cancer
230   norm nocancer
231   norm nocancer
232   norm nocancer
233   norm nocancer
234   norm nocancer
235   norm nocancer
236   norm nocancer
237   norm nocancer
238   norm nocancer
239   norm nocancer
240   norm nocancer
241   norm nocancer
242   norm   cancer
243   norm nocancer
244   norm nocancer
245   norm nocancer
246   norm nocancer
247   norm nocancer
248   norm nocancer
249   norm nocancer
250   norm nocancer
251   norm nocancer
252   norm nocancer
253   norm nocancer
254   norm nocancer
255   norm nocancer
256   norm nocancer
257   norm nocancer
258   norm nocancer
259   norm nocancer
260   norm nocancer
261   norm nocancer
262   norm nocancer
263   norm nocancer
264   norm nocancer
265   norm nocancer
266   norm nocancer
267   norm nocancer
268   norm nocancer
269   norm nocancer
270   norm nocancer
271   norm nocancer
272   norm nocancer
273   norm nocancer
274   norm nocancer
275   norm nocancer
276   norm nocancer
277   norm nocancer
278   norm nocancer
279   norm nocancer
280   norm nocancer
281   norm nocancer
282   norm nocancer
283   norm nocancer
284   norm nocancer
285   norm nocancer
286   norm nocancer
287   norm nocancer
288   norm nocancer
289   norm nocancer
290   norm nocancer
291   norm nocancer
292   norm nocancer
293   norm nocancer
294   norm nocancer
295   norm nocancer
296   norm nocancer
297   norm nocancer
298   norm nocancer
299   norm nocancer
300   norm nocancer
301   norm nocancer
302   norm nocancer
303   norm nocancer
304   norm nocancer
305   norm   cancer
306   norm nocancer
307   norm nocancer
308   norm nocancer
309   norm nocancer
310   norm nocancer
311   norm nocancer
312   norm nocancer
313   norm nocancer
314   norm nocancer
315   norm nocancer
316   norm nocancer
317   norm nocancer
318   norm nocancer
319   norm nocancer
320   norm nocancer
321   norm nocancer
322   norm nocancer
323   norm nocancer
324   norm nocancer
325   norm nocancer
326   norm nocancer
327   norm nocancer
328   norm   cancer
329   norm nocancer
330   norm nocancer
331   norm nocancer
332   norm nocancer
333   norm nocancer
334   norm nocancer
335   norm nocancer
336   norm nocancer
337   norm nocancer
338   norm nocancer
339   norm nocancer
340   norm nocancer
341   norm nocancer
342   norm nocancer
343   norm nocancer
344   norm nocancer
345   norm nocancer
346   norm nocancer
347   norm nocancer
348   norm nocancer
349   norm nocancer
350   norm nocancer
351   norm nocancer
352   norm nocancer
353   norm nocancer
354   norm nocancer
355   norm nocancer
356   norm nocancer
> tab <- table(df)
> tab
        cancer
gene     nocancer cancer
  mutant      121     19
  norm        210      6
> tab[1,2]
[1] 19
> tab[1,1]
[1] 121
> 
> # p.c.m = p.cancer.mut the above
> p.cancer.mutant <- tab[1,2]/(tab[1,1]+tab[1,2])
> p.nocancer.mutant <- tab[1,1]/(tab[1,1]+tab[1,2])
> p.cancer.mutant
[1] 0.1357143
> 1-p.cancer.mutant
[1] 0.8642857
> p.nocancer.mutant
[1] 0.8642857
> 
> p.cancer.norm <-  tab[2,2]/(tab[2,1]+tab[2,2])
> p.nocancer.norm <- 1-p.cancer.norm
> p.cancer.norm
[1] 0.02777778
> p.nocancer.norm
[1] 0.9722222
> 
> odds(p.cancer.mutant)
[1] 0.1570248
> odds(p.cancer.norm)
[1] 0.02857143
> odds.ratio(p.cancer.mutant, p.cancer.norm)
[1] 5.495868
> 

Logit 성질

여기서
\begin{align*} ln(x) & = y \\ log_e {x} & = y \\ x & = e^{y} \\ \end{align*}

위에서

  • $ \text{if } \;\;\; x = 1, $
    • $ e^{y} = 1 $ 이므로
    • $ y = 0 $
    • $ \therefore \;\; log_{e}(1) = 0 $
  • $\text{if } \;\;\; x = 0 $
    • $ 0 = e^{y} $ 이므로
    • y 는 $ - \infty $
    • 왜냐하면, $ e^{-\infty} = \frac {1}{e^{\infty}} = \frac {1}{\infty} = 0 $ 혹은 $0$ 에 수렴하기 때문
    • $ \therefore \;\; log_{e}(0) = - \infty $
  • $\text{if } \;\;\; x = \infty $
    • $ \infty = e^{y} $ 이므로
    • $ y = \infty $ 어야 함
    • $ \therefore \;\; log_{e}(\infty) = + \infty $
  • Odds 는 확률 $0.5$를 기준으로 $0-1$ 과 $1-\infty$ 범위를 갖는다고 하였는데
  • 이 Odds에 log를 씌우면 그 범위는
  • $-\infty$ 에서 $\infty $가 되어서 a+bX에 맞춰서 해석을 할 수 있게 된다.
> load("nsduh2019_adult_sub_rmph.RData")
> # Shorter name
> nsduh <- nsduh_adult_sub
> tab <- table(nsduh$demog_sex, nsduh$mj_lifetime)
> tab
        
          No Yes
  Male   206 260
  Female 285 249
> 
# Marijuana experience (me) in lifetime
	NO	YES	
Male	206	260	466
Female	285	249	534
	491	509	1000

P(me among males) = 260 / 466 = 0.5579399
P(me among females) = 249 / 534 = 0.4662921
Odds for males = 260 / 206 = 1.262136
Odds for females = 249 / 285 = 0.8736842
Odds ratio between males and females = (260 / 206) / (249 / 285) = 1.262136 / 0.8736842 = 1.444613
odds       <- function(p)      p/(1-p)
odds.ratio <- function(p1, p2) odds(p1)/odds(p2)
logit      <- function(p)      log(p/(1-p))
ilogit     <- function(x)      exp(x)/(1+exp(x))
# exp() is the exponential function
pm <- tab[1,2]/(tab[1,1]+tab[1,2])
pf <- tab[2,2]/(tab[2,1]+tab[2,2]) 
om <- odds(pm)
of <- odds(pf)
ormf <- odds.ratio(pm,pf)
pm
pf
om
of
ormf

> pm
[1] 0.5579399
> pf
[1] 0.4662921
> om
[1] 1.262136
> of
[1] 0.8736842
> ormf
[1] 1.444613
> 

x1 <- logit(pm)
x2 <- logit(pf)
x1
x2
ilogit(x1)
ilogit(x2)

> x1 <- logit(pm)
> x2 <- logit(pf)
> x1
[1] 0.2328055
> x2
[1] -0.1350363
> ilogit(x1)
[1] 0.5579399
> ilogit(x2)
[1] 0.4662921
> 
> 

Logitistic Regression Analysis

\begin{align*} \displaystyle ln \left( {\frac{p}{(1-p)}} \right) = a + bX \end{align*}

  • p = 변인 X가 A일 확률
  • 1-p = 변인 X가 A가 아닐 확률
  • ln 은 e를 밑으로 하는 log 를 말한다
  • $ln \left( {\frac{p}{(1-p)}} \right) $ 을 $\text{logit(p)}$ 로 부른다

\begin{align} ln \left( {\frac{p}{1-p}} \right) & = a + bX \nonumber \\ \frac{p}{1-p} & = e^{a+bX} \nonumber \\ p & = e^{a+bX} * (1-p) \nonumber \\ p & = e^{a+bX} - p * \left(e^{a+bX} \right) \nonumber \\ p + p * \left(e^{a+bX} \right) & = e^{a+bX} \nonumber \\ p * \left(1 + e^{a+bX} \right) & = e^{a+bX} \nonumber \\ p & = \frac {e^{a+bX}} { \left(1 + e^{a+bX} \right)} \\ \end{align}

  • 위에서 계수 b값이 충분히 커서 X 가 커지면 p 값은 1로 수렴하고
  • b값이 충분히 작아서 X가 아주 작아지면 p 값은 0에 가까이 간다
  • 즉 ln(p/(1-p))는 직선의 관계를 갖지만 (a+bX)
  • p값은 0에서 1사이의 값을 갖게 된다.
  • p의 그래프는 아래와 같은 곡선이다.
install.packages("sigmoid")
library(sigmoid)
library(ggplot2)
input <- seq(-5, 5, 0.01)
df = data.frame(input, logistic(input), Gompertz(input))
ggplot( df, aes(input, logistic(input)) ) + 
  geom_line(color="red")

Binary Logistic Regression

독립변인이 종류일 때에
IVs: categorical or numerical variables
DV: categorical variable

\begin{align} ln \left( {\frac{p}{(1-p)}} \right) & = a + bX \\ \end{align}

  • $p$ = probability of an event happening
  • $(1-p)$ = probability of an event NOT happening
  • $p/(1-p)$ : odds of the event
  • $ln (p/(1-p))$ : natural logarithm of the odds : log-odds or logit
  • to get the odds from the above logit, we use the inverse of natural logarithm
  • from $ln (p/(1-p)) = x$ OR $Log(odds) = x$
  • to $p/(1-p) = odds = e^x$
  • to convert log odds to probability $p$,
  • we use inverse login function $e^x/(1 + e^x) = 0.4662912$

intercept (절편) 해석

  • 위의 식 [2]에서 $X=0$ 일 경우 (현재 binary IV에 대해서 이야기하고 있음에 주의),
  • $ln(p/(1-p) = a$ 에서
    • (모든) 독립변인이 0의 값을 가질 때 (독립변인이 0이 되는 경우 = 기준 category일 경우),
    • 절편 값 $a$ 는 로짓값을 (log-odds 값 $ln(p/(1-p)$) 갖게 된다.
    • 따라서 $ p = \displaystyle \frac {e^a}{1+e^a}$
    • 아래 아웃풋에서 $a = -0.13504$ 이므로
    • $ p = \displaystyle \frac {e^{-0.13504}}{1+e^{-0.13504}} = 0.4662912$
    • 위는 정의된 평션으로 (ilogit) 구해도 된다 ilogit(a) = 0.4662912
    • 마지막으로 이 값은 우리가 이미 table에서 구한 probability of female yes 값과 (PF.yes) 같다.
odds       <- function(p)      p/(1-p)
odds.ratio <- function(p1, p2) odds(p1)/odds(p2)
# log odds를 구하는 function
logit      <- function(p)      log(p/(1-p))
# probability를 구하는 function 
# p = e^x/(1+e^x)
ilogit     <- function(x)      exp(x)/(1+exp(x))
# 절편 해석 
e^-0.13504/(1 + e^-0.13504) # 위의 절편에 대한 p 값 계산 
# p = e^x/1+e^x
summary(fit.ex6.2)$coefficient # coefficient 값들 출력
summary(fit.ex6.2)$coefficient[1,1] # 절편 값 출력
# or 
coef(fit.ex6.2)["(Intercept)"]
# coef(fit.ex6.2)[1]
p <- ilogit(summary(fit.ex6.2)$coefficient[1,1]) 
p 
# 이 값은 PF.yes 값과 같다
PF.yes

coefficient (계수) 해석

  • $X = 1$ 일 경우 $ln(odds) = a + bX = a + b $
  • 아래 아웃 풋에서
  • a = (Intercept) = -0.13504
  • b = demog_sexMale = 0.36784
  • 따라서 $a + b = -0.13504 + 0.36784 = 0.2328 $
  • 즉, $ln(odds) = 0.2328 $ 이고
  • $ odds = \displaystyle \frac {p_{\text{ of male yes}}}{p-1} = e^{0.2328} = 1.262129$ 이것은 X가 1일 경우이다.
  • $ p = e^{0.2328} / (1 + e^{0.2328}) = 0.5579386 $ 그리고 X는 1일 경우의 prob = 0.558 정도이다.
  • or ilogit(0.2328) = 0.5579386
  • coefficient값 (0.36784) 은 아래처럼 구할 수도 있다
  • summary(fit.ex6.2)$coefficient[2,1]
  • $e^{b}$ 값은 male vs female 의 yes에 대한 odds ratio 를 말한다
  • why?
    • om/of = 1.444613 or
    • odds.ratio(pm, pf) = 1.444613
    • 즉, $log(om/of) = b$
    • $log(1.444613) = b$
> log(1.444613)
[1] 0.3678415 # 이는 계수 값 b값이다. 

> b <- summary(fit.ex6.2)$coefficient[2,1]
> b
[1] 0.3678417
> e^b
[1] 1.444613
> 
  • 이 값은 앞서 tab에서 구한 odds ratio 이다 (male odds / female odds = om/of).
  • X = 0 (female)에서 X = 1 (male) 로 바뀔때의 odds ratio는 1.444613으로
  • 남자의 마리화나 경험이 여성에 비해 44.5% 증가한다고 해석

glm in R

nsduh <- nsduh %>% 
  mutate(demog_sex = relevel(demog_sex, ref = "Female"))

fit.ex6.2 <- glm(mj_lifetime ~ demog_sex,
                 family = binomial, data = nsduh)
summary(fit.ex6.2)
# install.packages("dplyr")
# library(dplyr)
> nsduh <- nsduh %>% 
+   mutate(demog_sex = relevel(demog_sex, ref = "Female"))
> fit.ex6.2 <- glm(mj_lifetime ~ demog_sex,
+                  family = binomial, data = nsduh)
> summary(fit.ex6.2)

Call:
glm(formula = mj_lifetime ~ demog_sex, family = binomial, data = nsduh)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)   
(Intercept)   -0.13504    0.08675  -1.557  0.11954   
demog_sexMale  0.36784    0.12738   2.888  0.00388 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1386.0  on 999  degrees of freedom
Residual deviance: 1377.6  on 998  degrees of freedom
AIC: 1381.6

Number of Fisher Scoring iterations: 3

> 

CI of b coefficient

fit.ex6.2에서

# 위의 b값에 대한 CI을 구하기 위해서 confint 펑션을 사용한다
confint(fit.ex6.2)
# b값에 대한 것만 알고 싶으므로 (drop = F는 
confint(fit.ex6.2)[2, , drop=F]
# 그리고 이 값들의 실제 odds ratio값을 보려면 
exp(confint(fit.ex6.2)[2, , drop=F])
> # 위의 b값에 대한 CI을 구하기 위해서 confint 펑션을 사용한다
> confint(fit.ex6.2)
Waiting for profiling to be done...
                   2.5 %     97.5 %
(Intercept)   -0.3054835 0.03475989
demog_sexMale  0.1185985 0.61808060
> # b값에 대한 것만 알고 싶으므로 
> confint(fit.ex6.2)[2, , drop=F]
Waiting for profiling to be done...
                  2.5 %    97.5 %
demog_sexMale 0.1185985 0.6180806
> # 그리고 이 값들의 실제 odds ratio값을 보려면 
> exp(confint(fit.ex6.2)[2, , drop=F])
Waiting for profiling to be done...
                 2.5 %   97.5 %
demog_sexMale 1.125918 1.855363
> 

coefficient값에 대한 테스트

일반 regression에서 b값은 t-test를 했지만 여기서는 z-test를 (Wald test) 한다. 이는 IV가 종류이거나 숫자일 때 모두 마찬가지이다.

# install.packages("car")
# library(car)
# coefficient probability test
car::Anova(fit.ex6.2, type = 3, test.statistic = "Wald")
> # coefficient probability test
> car::Anova(fit.ex6.2, type = 3, test.statistic = "Wald")
Analysis of Deviance Table (Type III tests)

Response: mj_lifetime
            Df  Chisq Pr(>Chisq)   
(Intercept)  1 2.4233    0.11954   
demog_sex    1 8.3394    0.00388 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 

마리화나의 사용경험에서 남성이 여성보다 큰 승산이 있다고 판단되었다 (Odds ratio (OR) = 1.44; 95% CI = 1.13, 1.86; p = .004). 남성은 여성보다 약 44% 더 사용경험을 할 승산을 보였다 (OR = 1.44).

X: numeric variable

########################################
########################################
########################################
# numeric IV
fit.ex6.3 <- glm(mj_lifetime ~ alc_agefirst, family = binomial, data = nsduh)
round(summary(fit.ex6.3)$coef, 4)

ilogit(coef(fit.ex6.3)["(Intercept)"])
# 0.9952 = prob of starting marijuana when age is 0
# when the age is zero (intercept이므로)

# age = 0 에서 추정하는 것은 이상함 
summary(nsduh$alc_agefirst)

# 위의 아웃풋에서 Mean값이 약 17이므로 17을 
# 기준으로 하여 다시 보면
# install.packages("dplyr") # for %>% function
# library(dplyr)
# install.packages("tidyverse") # for mutate function
# library(tidyverse)


nsduh <- nsduh %>% 
  mutate(calc_agefirst = alc_agefirst - 17)
fit.ex6.3.centered <- glm(mj_lifetime ~ calc_agefirst,
                          family = binomial, data = nsduh)
fit.ex6.3.centered
ilogit(coef(fit.ex6.3.centered)["(Intercept)"])

# b coefficient 
# 17살일 때를 기준으로 한살씩 증가할 때마다의 
# 마리화나 경험/비경험의 Odds ratio는  -0.2835
# 이를 수치화하면 
exp(coef(fit.ex6.3.centered)["calc_agefirst"])

# 이는 17이후에 한살씩 알콜처음 경험을 늦추면 
# 마리화나 경험율 대 미경험 odds ratio가 
# 0.247 낮아진다고 할 수 있다 (0.7531 증가는)
# 1-0.7531로 보는 것

# 그리고 이에 대한 CI를 보면 아래와 같고
confint(fit.ex6.3.centered)[2, , drop = F]
# 이를 승비로 (odds ratio) 보면 
exp(confint(fit.ex6.3.centered)[2, , drop = F])
> ########################################
> ########################################
> ########################################
> # numeric IV
> fit.ex6.3 <- glm(mj_lifetime ~ alc_agefirst, family = binomial, data = nsduh)
> round(summary(fit.ex6.3)$coef, 4)
             Estimate Std. Error  z value Pr(>|z|)
(Intercept)    5.3407     0.4747  11.2510        0
alc_agefirst  -0.2835     0.0267 -10.6181        0
> 
> ilogit(coef(fit.ex6.3)["(Intercept)"])
(Intercept) 
  0.9952302 
> # 0.9952 = prob of female starting marijuana 
> # when the age is zero (intercept이므로)
> 
> # age = 0 에서 추정하는 것은 이상함 
> summary(nsduh$alc_agefirst)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   3.00   15.00   17.00   17.49   19.00   45.00     157 
> 
> # 위의 아웃풋에서 Mean값이 약 17이므로 17을 
> # 기준으로 하여 다시 보면
> 
> nsduh <- nsduh %>% 
+   mutate(calc_agefirst = alc_agefirst - 17)
> fit.ex6.3.centered <- glm(mj_lifetime ~ calc_agefirst,
+                           family = binomial, data = nsduh)
> fit.ex6.3.centered

Call:  glm(formula = mj_lifetime ~ calc_agefirst, family = binomial, 
    data = nsduh)

Coefficients:
  (Intercept)  calc_agefirst  
       0.5207        -0.2835  

Degrees of Freedom: 842 Total (i.e. Null);  841 Residual
  (157 observations deleted due to missingness)
Null Deviance:	    1141 
Residual Deviance: 968.4 	AIC: 972.4
> ilogit(coef(fit.ex6.3.centered)["(Intercept)"])
(Intercept) 
  0.6273001 
> 
> # b coefficient 
> # 17살일 때를 기준으로 한살씩 증가할 때마다의 
> # 마리화나 경험/비경험의 Odds ratio는  -0.2835
> # 이를 수치화하면 
> exp(coef(fit.ex6.3.centered)["calc_agefirst"])
calc_agefirst 
    0.7531198 
> 
> # 이는 17이후에 한살씩 알콜처음 경험을 늦추면 
> # 마리화나 경험율 대 미경험 odds ratio가 
> # 0.247로 낮아진다고 할 수 있다 (0.7531 증가는)
> # 1-0.7531로 보는 것
> 
> # 그리고 이에 대한 CI를 보면 아래와 같고
> confint(fit.ex6.3.centered)[2, , drop = F]
Waiting for profiling to be done...
                   2.5 %    97.5 %
calc_agefirst -0.3375819 -0.232863
> # 이를 승비로 (odds ratio) 보면 
> exp(confint(fit.ex6.3.centered)[2, , drop = F])
Waiting for profiling to be done...
                  2.5 %    97.5 %
calc_agefirst 0.7134935 0.7922621
> 

해석: 처음 알콜경험한 나이와 마리화나 처음경험과는 음의 상관관계를 보였다 (OR = 0.753; 95% CI = 0.713, 0.792; p < .001). 개인의 알콜경험 나이가 한 살씩 많아질 때마다 (가령 18살에서 19살로), 마리화나의 처음경험은 24.7% 낮아지는 것으로 판단이 되었다.

IV increase not by one, but by many

  • 처음 알콜 경험이 3년 늦춰지게 되면 $24.7\% * 3$ 인가?
  • 그렇지 않고 처음 승비를 알려주는 b coefficient에서 (odds ratio = -0.2835)
  • 3을 곱해준 후, 해당 OR을 구한다. 즉
  • $e^{-0.2835*3}$
  • 아래처럼 약 42.71% 이므로
  • 3년 터울로 보면 약 (100-42.71% = 57.29%) 마리화나 처음경험의 odds를 갖는다고 하겠다
> #################################
> # 1년이 아니라 3년일 경우
> fit.ex6.3.centered

Call:  glm(formula = mj_lifetime ~ calc_agefirst, family = binomial, 
    data = nsduh)

Coefficients:
  (Intercept)  calc_agefirst  
       0.5207        -0.2835  

Degrees of Freedom: 842 Total (i.e. Null);  841 Residual
  (157 observations deleted due to missingness)
Null Deviance:	    1141 
Residual Deviance: 968.4 	AIC: 972.4
> coef(fit.ex6.3.centered)["calc_agefirst"]
calc_agefirst 
    -0.283531 
> coef(fit.ex6.3.centered)["calc_agefirst"]*3
calc_agefirst 
    -0.850593 
> exp(coef(fit.ex6.3.centered)["calc_agefirst"]*3)
calc_agefirst 
    0.4271616 
> 

CI는

> # CI 의 경우 아래와 같고
> confint(fit.ex6.3.centered)[2,]*3
Waiting for profiling to be done...
    2.5 %    97.5 % 
-1.012746 -0.698589 
> # 이에 해당하는 값은 
> exp(confint(fit.ex6.3.centered)[2,]*3)
Waiting for profiling to be done...
    2.5 %    97.5 % 
0.3632203 0.4972865 
> 

Multiple Regression

DV: lifetime marijuana use (mj_lifetime)
IVs:

  • age at first use of alcohol (alc_agefirst),
  • adjusted for age (demog_age_cat6),
  • sex (demog_sex), and
  • income (demog_income)
fit.ex6.3.adj <- glm(mj_lifetime ~ alc_agefirst + demog_age_cat6 + demog_sex +
                     demog_income, family = binomial, data = nsduh)
# Regression coefficient table
round(summary(fit.ex6.3.adj)$coef, 4)
> #################################
> #################################
> ## Multiple regression
> #################################
> fit.ex6.3.adj <- glm(mj_lifetime ~ alc_agefirst + 
+                        demog_age_cat6 + demog_sex +
+                        demog_income, 
+                      family = binomial, data = nsduh)
> # Regression coefficient table
> round(summary(fit.ex6.3.adj)$coef, 4)
                              Estimate Std. Error z value Pr(>|z|)
(Intercept)                     6.2542     0.5914 10.5759   0.0000
alc_agefirst                   -0.2754     0.0276 -9.9922   0.0000
demog_age_cat626-34            -0.2962     0.3286 -0.9012   0.3675
demog_age_cat635-49            -0.8043     0.2966 -2.7120   0.0067
demog_age_cat650-64            -0.6899     0.2985 -2.3109   0.0208
demog_age_cat665+              -1.2748     0.3043 -4.1893   0.0000
demog_sexMale                  -0.0609     0.1618 -0.3763   0.7067
demog_income$20,000 - $49,999  -0.5309     0.2664 -1.9927   0.0463
demog_income$50,000 - $74,999  -0.0793     0.3049 -0.2601   0.7948
demog_income$75,000 or more    -0.3612     0.2532 -1.4264   0.1538
> 
# Regression coefficient table
round(summary(fit.ex6.3.adj)$coef, 4)
coef(fit.ex6.3.adj)
exp(coef(fit.ex6.3.adj))
col1 <- exp(coef(fit.ex6.3.adj))
confint(fit.ex6.3.adj)
exp(confint(fit.ex6.3.adj))
col2 <- exp(confint(fit.ex6.3.adj))
cbind("AdjOR" = col1, col2)[-1,]
round(cbind("AdjOR" = col1, col2)[-1,],3)

# OR은 coefficient 값을 이야기하는 것을 다시 확인
# 또한 Wald significant test 도 실행 
car::Anova(fit.ex6.3.adj, type = 3, test.statistic = "Wald")
> # Regression coefficient table
> round(summary(fit.ex6.3.adj)$coef, 4)
                              Estimate Std. Error z value Pr(>|z|)
(Intercept)                     6.2542     0.5914 10.5759   0.0000
alc_agefirst                   -0.2754     0.0276 -9.9922   0.0000
demog_age_cat626-34            -0.2962     0.3286 -0.9012   0.3675
demog_age_cat635-49            -0.8043     0.2966 -2.7120   0.0067
demog_age_cat650-64            -0.6899     0.2985 -2.3109   0.0208
demog_age_cat665+              -1.2748     0.3043 -4.1893   0.0000
demog_sexMale                  -0.0609     0.1618 -0.3763   0.7067
demog_income$20,000 - $49,999  -0.5309     0.2664 -1.9927   0.0463
demog_income$50,000 - $74,999  -0.0793     0.3049 -0.2601   0.7948
demog_income$75,000 or more    -0.3612     0.2532 -1.4264   0.1538
> coef(fit.ex6.3.adj)
                  (Intercept)                  alc_agefirst 
                   6.25417324                   -0.27541454 
          demog_age_cat626-34           demog_age_cat635-49 
                  -0.29618703                   -0.80427437 
          demog_age_cat650-64             demog_age_cat665+ 
                  -0.68990572                   -1.27475385 
                demog_sexMale demog_income$20,000 - $49,999 
                  -0.06088993                   -0.53087558 
demog_income$50,000 - $74,999   demog_income$75,000 or more 
                  -0.07930897                   -0.36119745 
> exp(coef(fit.ex6.3.adj))
                  (Intercept)                  alc_agefirst 
                  520.1791313                     0.7592573 
          demog_age_cat626-34           demog_age_cat635-49 
                    0.7436483                     0.4474125 
          demog_age_cat650-64             demog_age_cat665+ 
                    0.5016234                     0.2794998 
                demog_sexMale demog_income$20,000 - $49,999 
                    0.9409268                     0.5880898 
demog_income$50,000 - $74,999   demog_income$75,000 or more 
                    0.9237545                     0.6968414 
> col1 <- exp(coef(fit.ex6.3.adj))
> confint(fit.ex6.3.adj)
Waiting for profiling to be done...
                                   2.5 %      97.5 %
(Intercept)                    5.1309585  7.45103372
alc_agefirst                  -0.3312435 -0.22314999
demog_age_cat626-34           -0.9490643  0.34307731
demog_age_cat635-49           -1.4002915 -0.23429671
demog_age_cat650-64           -1.2893673 -0.11559836
demog_age_cat665+             -1.8854986 -0.68944515
demog_sexMale                 -0.3790935  0.25566208
demog_income$20,000 - $49,999 -1.0591496 -0.01305382
demog_income$50,000 - $74,999 -0.6785210  0.51882750
demog_income$75,000 or more   -0.8643471  0.13016735
> exp(confint(fit.ex6.3.adj))
Waiting for profiling to be done...
                                    2.5 %       97.5 %
(Intercept)                   169.1792047 1721.6419228
alc_agefirst                    0.7180303    0.7999949
demog_age_cat626-34             0.3871031    1.4092777
demog_age_cat635-49             0.2465251    0.7911270
demog_age_cat650-64             0.2754450    0.8908329
demog_age_cat665+               0.1517534    0.5018544
demog_sexMale                   0.6844816    1.2913163
demog_income$20,000 - $49,999   0.3467506    0.9870310
demog_income$50,000 - $74,999   0.5073668    1.6800566
demog_income$75,000 or more     0.4213265    1.1390190
> col2 <- exp(confint(fit.ex6.3.adj))
Waiting for profiling to be done...
> cbind("AdjOR" = col1, col2)[-1,]
                                  AdjOR     2.5 %    97.5 %
alc_agefirst                  0.7592573 0.7180303 0.7999949
demog_age_cat626-34           0.7436483 0.3871031 1.4092777
demog_age_cat635-49           0.4474125 0.2465251 0.7911270
demog_age_cat650-64           0.5016234 0.2754450 0.8908329
demog_age_cat665+             0.2794998 0.1517534 0.5018544
demog_sexMale                 0.9409268 0.6844816 1.2913163
demog_income$20,000 - $49,999 0.5880898 0.3467506 0.9870310
demog_income$50,000 - $74,999 0.9237545 0.5073668 1.6800566
demog_income$75,000 or more   0.6968414 0.4213265 1.1390190
> round(cbind("AdjOR" = col1, col2)[-1,],3)
                              AdjOR 2.5 % 97.5 %
alc_agefirst                  0.759 0.718  0.800
demog_age_cat626-34           0.744 0.387  1.409
demog_age_cat635-49           0.447 0.247  0.791
demog_age_cat650-64           0.502 0.275  0.891
demog_age_cat665+             0.279 0.152  0.502
demog_sexMale                 0.941 0.684  1.291
demog_income$20,000 - $49,999 0.588 0.347  0.987
demog_income$50,000 - $74,999 0.924 0.507  1.680
demog_income$75,000 or more   0.697 0.421  1.139
> 
> # OR은 coefficient 값을 이야기하는 것을 다시 확인
> # 또한 Wald significant test 도 실행 
> 
> car::Anova(fit.ex6.3.adj, type = 3, test.statistic = "Wald")
Analysis of Deviance Table (Type III tests)

Response: mj_lifetime
               Df    Chisq Pr(>Chisq)    
(Intercept)     1 111.8504  < 2.2e-16 ***
alc_agefirst    1  99.8435  < 2.2e-16 ***
demog_age_cat6  4  23.0107   0.000126 ***
demog_sex       1   0.1416   0.706685    
demog_income    3   5.4449   0.141974    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Interpretation of the output:

  • The AOR for our primary predictor alc_agefirst is 0.759. This represents the OR for lifetime marijuana use comparing those with a one-year difference in age at first use of alcohol, adjusted for age, sex, and income.
  • The remaining AORs compare levels of categorical predictors to their reference level, adjusted for the other predictors in the model.
    • For example, comparing individuals with the same age of first alcohol use, sex, and income, 35-49 year-olds have 55.3% lower odds of lifetime marijuana use than 18-25 year-olds (OR = 0.447; 95% CI = 0.247, 0.791; p = .007).
    • The p-value for this specific comparison of ages comes from the coefficients table. An overall, 4 df p-value for age, can be read from the Type III Test table (0.00013).
    • The Type III tests output contains the multiple df Wald tests for categorical predictors with more than two levels. For continuous predictors, or for categorical predictors with exactly two levels, the Type III Wald test p-values are identical to those in the Coefficients table.

Conclusion:

  • After adjusting for age, sex, and income, age at first alcohol use is significantly negatively associated with lifetime marijuana use (AOR = 0.759; 95% CI = 0.718, 0.800; p < .001). Individuals who first used alcohol at a given age have 24.1% lower odds of having ever used marijuana than those who first used alcohol one year earlier.

e.g. 1

# install.packages("oddsratio")
# library(oddsratio)
fit_glm <- glm(admit ~ gre + gpa + rank, data = data_glm, family = "binomial") 

summary(fit_glm)



# Calculate OR for specific increment step of continuous variable
or_glm(data = data_glm, model = fit_glm, 
       incr = list(gre = 40, gpa = .1))

e.g. 2

https://stats.idre.ucla.edu/r/dae/logit-regression/

mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
## view the first few rows of the data
head(mydata)
  admit gre  gpa rank
1     0 380 3.61    3
2     1 660 3.67    3
3     1 800 4.00    1
4     1 640 3.19    4
5     0 520 2.93    4
6     1 760 3.00    2
summary(mydata)
     admit             gre             gpa             rank      
 Min.   :0.0000   Min.   :220.0   Min.   :2.260   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:520.0   1st Qu.:3.130   1st Qu.:2.000  
 Median :0.0000   Median :580.0   Median :3.395   Median :2.000  
 Mean   :0.3175   Mean   :587.7   Mean   :3.390   Mean   :2.485  
 3rd Qu.:1.0000   3rd Qu.:660.0   3rd Qu.:3.670   3rd Qu.:3.000  
 Max.   :1.0000   Max.   :800.0   Max.   :4.000   Max.   :4.000  
sapply(mydata, mean)
sapply(mydata, sd)
> sapply(mydata, mean)
   admit      gre      gpa     rank 
  0.3175 587.7000   3.3899   2.4850 
> sapply(mydata, sd)
      admit         gre         gpa        rank 
  0.4660867 115.5165364   0.3805668   0.9444602 
> 
xtabs(~admit + rank, data = mydata)
> xtabs(~admit + rank, data = mydata)
     rank
admit  1  2  3  4
    0 28 97 93 55
    1 33 54 28 12
mydata$rank <- factor(mydata$rank)
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
summary(mylogit)
Call:
glm(formula = admit ~ gre + gpa + rank, family = "binomial", 
    data = mydata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6268  -0.8662  -0.6388   1.1490   2.0790  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.989979   1.139951  -3.500 0.000465 ***
gre          0.002264   0.001094   2.070 0.038465 *  
gpa          0.804038   0.331819   2.423 0.015388 *  
rank2       -0.675443   0.316490  -2.134 0.032829 *  
rank3       -1.340204   0.345306  -3.881 0.000104 ***
rank4       -1.551464   0.417832  -3.713 0.000205 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 499.98  on 399  degrees of freedom
Residual deviance: 458.52  on 394  degrees of freedom
AIC: 470.52

Number of Fisher Scoring iterations: 4

> 
> confint(mylogit)
Waiting for profiling to be done...
                    2.5 %       97.5 %
(Intercept) -6.2716202334 -1.792547080
gre          0.0001375921  0.004435874
gpa          0.1602959439  1.464142727
rank2       -1.3008888002 -0.056745722
rank3       -2.0276713127 -0.670372346
rank4       -2.4000265384 -0.753542605
> 
> ## CIs using standard errors
> confint.default(mylogit)
                    2.5 %       97.5 %
(Intercept) -6.2242418514 -1.755716295
gre          0.0001202298  0.004408622
gpa          0.1536836760  1.454391423
rank2       -1.2957512650 -0.055134591
rank3       -2.0169920597 -0.663415773
rank4       -2.3703986294 -0.732528724
wald.test(b = coef(mylogit), Sigma = vcov(mylogit), Terms = 4:6)
l <- cbind(0, 0, 0, 1, -1, 0)
wald.test(b = coef(mylogit), Sigma = vcov(mylogit), L = l)

관련 동영상

Log의 성질

Logistic Regression Tutorial

\begin{align} y = b_{0} + b_{1}x \\ p = \frac{1} {1 + e^{-y}} \\ ln(\frac{p}{1-p}) = b_{0} + b_{1}x \\ \end{align}

Logistic Regression Details Pt1: Coefficients

Logistic Regression Details Pt 2: Maximum Likelihood

Logistic Regression Details Pt 3: R-squared and p-value

logistic_regression.1733809554.txt.gz · Last modified: 2024/12/10 14:45 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki