User Tools

Site Tools


logistic_regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
logistic_regression [2024/12/09 07:50] – old revision restored (2023/12/14 07:55) hkimscillogistic_regression [2024/12/11 11:57] (current) – [exercise: binary IV] hkimscil
Line 2: Line 2:
 https://www.bookdown.org/rwnahhas/RMPH/blr-orlr.html https://www.bookdown.org/rwnahhas/RMPH/blr-orlr.html
 data: https://www.bookdown.org/rwnahhas/RMPH/appendix-nsduh.html#appendix-nsduh data: https://www.bookdown.org/rwnahhas/RMPH/appendix-nsduh.html#appendix-nsduh
 +
 +[[:Logistic Regression/examples R]]
 ====== Data preparation ====== ====== Data preparation ======
   * [[https://www.datafiles.samhsa.gov/sites/default/files/field-uploads-protected/studies/NSDUH-2019/NSDUH-2019-datasets/NSDUH-2019-DS0001/NSDUH-2019-DS0001-bundles-with-study-info/NSDUH-2019-DS0001-bndl-data-r.zip|NSDUH-2019-DS0001-bndl-data-r.zip 파일]] 다운로드   * [[https://www.datafiles.samhsa.gov/sites/default/files/field-uploads-protected/studies/NSDUH-2019/NSDUH-2019-datasets/NSDUH-2019-DS0001/NSDUH-2019-DS0001-bundles-with-study-info/NSDUH-2019-DS0001-bndl-data-r.zip|NSDUH-2019-DS0001-bndl-data-r.zip 파일]] 다운로드
Line 73: Line 75:
   * wald test   * wald test
 <code> <code>
-n <- 350 +########## 
-p.cancer <- 0.08 +# see youtube  
-p.mutant <- 0.39+# https://youtu.be/8nm0G-1uJzA 
 +n.mut <- 23+117 
 +n.norm <- 6+210 
 +p.cancer.mut <- 23/(23+117) 
 +p.cancer.norm <- 6/(6+210)
  
-set.seed(101+set.seed(1011
-c <- runif(n, 0, 1) +c <- runif(n.mut, 0, 1) 
-canc <- ifelse(c>=p.cancer, "nocancer", "cancer") +# 0 not cancer, 1 = cancer among mutant gene 
-c <- runif(n, 0, 1) +mutant <- ifelse(c>=p.cancer.mut01)
-gene <- ifelse(c>=p.mutant"norm""mutated")+
  
-da <- data.frame(gene, canc+<- runif(n.norm, 0, 1) 
-da +# 0 = not cancer, 1 = cancer among normal gene 
-tab <- table(da)+normal <- ifelse(c>=p.cancer.norm, 0, 1) 
 + 
 +# 0 = mutant; 1 = normal 
 +gene <- c(rep(0, length(mutant)), rep(1, length(normal))) 
 +# 0 = not cancer; 1 = cancer 
 +cancer <- c(mutant, normal) 
 + 
 +df <- as.data.frame(cbind(gene, cancer)) 
 +df 
 +df$gene <- factor(df$gene, levels = c(0,1), labels = c("mutant", "norm")) 
 +df$cancer <- factor(df$cancer, levels = c(0,1), labels = c("nocancer", "cancer")
 +df 
 +tab <- table(df)
 tab tab
 +tab[1,2]
 +tab[1,1]
  
 +# p.c.m = p.cancer.mut the above
 +p.cancer.mutant <- tab[1,2]/(tab[1,1]+tab[1,2])
 +p.nocancer.mutant <- tab[1,1]/(tab[1,1]+tab[1,2])
 +p.cancer.mutant
 +1-p.cancer.mutant
 +p.nocancer.mutant
 +
 +p.cancer.norm <-  tab[2,2]/(tab[2,1]+tab[2,2])
 +p.nocancer.norm <- 1-p.cancer.norm
 +p.cancer.norm
 +p.nocancer.norm
 +
 +odds(p.cancer.mutant)
 +odds(p.cancer.norm)
 +odds.ratio(p.cancer.mutant, p.cancer.norm)
 </code> </code>
 +
 <code> <code>
-> n <- 350 +> ########## 
-> p.cancer <- 0.08 +> # see youtube  
-> p.mutant <- 0.39+> # https://youtu.be/8nm0G-1uJzA 
 +> n.mut <- 23+117 
 +> n.norm <- 6+210 
 +> p.cancer.mut <- 23/(23+117) 
 +> p.cancer.norm <- 6/(6+210)
  
-> set.seed(101+> set.seed(1011
-> c <- runif(n, 0, 1) +> c <- runif(n.mut, 0, 1) 
-canc <- ifelse(c>=p.cancer, "nocancer", "cancer") +# 0 not cancer, 1 = cancer among mutant gene 
-c <- runif(n, 0, 1) +mutant <- ifelse(c>=p.cancer.mut01)
-> gene <- ifelse(c>=p.mutant"norm""mutated")+
  
-da <- data.frame(genecanc+<- runif(n.norm0, 1
-da +# 0 = not cancer, cancer among normal gene 
-       gene     canc +> normal <- ifelse(c>=p.cancer.norm, 0, 1) 
-     norm nocancer + 
-2   mutated   cancer +> # 0 = mutant; 1 = normal 
-3      norm nocancer +> gene <- c(rep(0, length(mutant)), rep(1, length(normal))) 
-4   mutated nocancer +> # 0 = not cancer; 1 = cancer 
-5      norm nocancer +cancer <- c(mutant, normal) 
-6      norm nocancer + 
-7      norm nocancer +> df <- as.data.frame(cbind(gene, cancer)) 
-8      norm nocancer +> df 
-9   mutated nocancer +    gene cancer 
-10  mutated nocancer +1      0      0 
-11     norm nocancer +2      0      1 
-12  mutated nocancer +3      0      0 
-13  mutated nocancer +4      0      0 
-14  mutated nocancer +5      0      0 
-15     norm nocancer +6      0      0 
-16     norm nocancer +> 
-17     norm nocancer +> df$gene <- factor(df$gene, levels = c(0,1), labels = c("mutant", "norm")) 
-18     norm nocancer +> df$cancer <- factor(df$cancer, levels = c(0,1), labels = c("nocancer", "cancer")) 
-19     norm nocancer +> df 
-20     norm   cancer +      gene   cancer 
-21  mutated nocancer +  mutant nocancer 
-22     norm nocancer +  mutant   cancer 
-23     norm nocancer +  mutant nocancer 
-24     norm nocancer +  mutant nocancer 
-25     norm nocancer +  mutant nocancer 
-26     norm nocancer +  mutant nocancer 
-27     norm   cancer +> 
-28     norm nocancer +> tab <- table(df)
-29     norm nocancer +
-30  mutated nocancer +
-31     norm nocancer +
-32  mutated nocancer +
-33     norm nocancer +
-34  mutated nocancer +
-35     norm nocancer +
-36     norm nocancer +
-37     norm nocancer +
-38  mutated nocancer +
-39  mutated   cancer +
-40     norm nocancer +
-41     norm nocancer +
-42  mutated nocancer +
-43  mutated nocancer +
-44     norm nocancer +
-45     norm nocancer +
-46     norm nocancer +
-47  mutated   cancer +
-48  mutated nocancer +
-49     norm nocancer +
-50  mutated nocancer +
-51     norm   cancer +
-52     norm nocancer +
-53  mutated nocancer +
-54     norm nocancer +
-55     norm nocancer +
-56     norm nocancer +
-57  mutated nocancer +
-58     norm nocancer +
-59     norm nocancer +
-60  mutated nocancer +
-61     norm nocancer +
-62     norm nocancer +
-63     norm nocancer +
-64  mutated nocancer +
-65     norm nocancer +
-66     norm nocancer +
-67     norm   cancer +
-68  mutated nocancer +
-69     norm nocancer +
-70  mutated nocancer +
-71     norm nocancer +
-72     norm nocancer +
-73  mutated nocancer +
-74     norm nocancer +
-75     norm nocancer +
-76     norm nocancer +
-77     norm   cancer +
-78     norm nocancer +
-79     norm nocancer +
-80     norm nocancer +
-81     norm nocancer +
-82     norm nocancer +
-83     norm nocancer +
-84     norm nocancer +
-85     norm   cancer +
-86     norm nocancer +
-87     norm nocancer +
-88     norm nocancer +
-89     norm nocancer +
-90     norm   cancer +
-91     norm nocancer +
-92     norm nocancer +
-93     norm nocancer +
-94     norm nocancer +
-95     norm nocancer +
-96     norm nocancer +
-97     norm nocancer +
-98  mutated nocancer +
-99  mutated   cancer +
-100 mutated   cancer +
-101 mutated nocancer +
-102 mutated   cancer +
-103    norm nocancer +
-104    norm nocancer +
-105    norm nocancer +
-106 mutated nocancer +
-107    norm nocancer +
-108    norm   cancer +
-109 mutated nocancer +
-110    norm nocancer +
-111    norm nocancer +
-112    norm   cancer +
-113    norm nocancer +
-114 mutated nocancer +
-115 mutated nocancer +
-116    norm nocancer +
-117    norm nocancer +
-118    norm nocancer +
-119    norm nocancer +
-120 mutated nocancer +
-121 mutated nocancer +
-122 mutated nocancer +
-123    norm   cancer +
-124    norm nocancer +
-125 mutated nocancer +
-126    norm nocancer +
-127    norm nocancer +
-128    norm nocancer +
-129    norm nocancer +
-130 mutated nocancer +
-131    norm nocancer +
-132 mutated nocancer +
-133 mutated nocancer +
-134 mutated nocancer +
-135 mutated nocancer +
-136    norm nocancer +
-137    norm nocancer +
-138 mutated nocancer +
-139    norm nocancer +
-140    norm nocancer +
-141 mutated nocancer +
-142 mutated nocancer +
-143 mutated nocancer +
-144    norm nocancer +
-145    norm nocancer +
-146    norm nocancer +
-147    norm nocancer +
-148 mutated nocancer +
-149 mutated   cancer +
-150    norm nocancer +
-151    norm nocancer +
-152    norm nocancer +
-153 mutated nocancer +
-154 mutated nocancer +
-155    norm nocancer +
-156    norm nocancer +
-157 mutated nocancer +
-158    norm nocancer +
-159 mutated nocancer +
-160 mutated nocancer +
-161 mutated nocancer +
-162    norm nocancer +
-163    norm nocancer +
-164 mutated nocancer +
-165    norm nocancer +
-166    norm nocancer +
-167 mutated nocancer +
-168 mutated nocancer +
-169    norm   cancer +
-170    norm nocancer +
-171 mutated nocancer +
-172    norm nocancer +
-173 mutated nocancer +
-174 mutated nocancer +
-175    norm nocancer +
-176    norm nocancer +
-177 mutated nocancer +
-178    norm nocancer +
-179    norm nocancer +
-180    norm nocancer +
-181    norm nocancer +
-182    norm nocancer +
-183    norm nocancer +
-184    norm nocancer +
-185    norm nocancer +
-186 mutated   cancer +
-187    norm nocancer +
-188    norm nocancer +
-189 mutated nocancer +
-190 mutated nocancer +
-191    norm nocancer +
-192    norm   cancer +
-193    norm nocancer +
-194    norm nocancer +
-195 mutated nocancer +
-196    norm nocancer +
-197    norm nocancer +
-198    norm nocancer +
-199 mutated nocancer +
-200 mutated nocancer +
-201    norm nocancer +
-202    norm nocancer +
-203    norm nocancer +
-204 mutated nocancer +
-205 mutated nocancer +
-206    norm nocancer +
-207    norm nocancer +
-208    norm nocancer +
-209 mutated nocancer +
-210    norm nocancer +
-211 mutated nocancer +
-212    norm nocancer +
-213 mutated nocancer +
-214    norm nocancer +
-215    norm   cancer +
-216 mutated nocancer +
-217    norm nocancer +
-218 mutated nocancer +
-219    norm nocancer +
-220    norm   cancer +
-221 mutated nocancer +
-222    norm nocancer +
-223 mutated nocancer +
-224    norm nocancer +
-225    norm nocancer +
-226    norm nocancer +
-227 mutated nocancer +
-228 mutated nocancer +
-229 mutated nocancer +
-230 mutated nocancer +
-231 mutated nocancer +
-232    norm nocancer +
-233    norm nocancer +
-234 mutated nocancer +
-235    norm nocancer +
-236    norm nocancer +
-237    norm nocancer +
-238    norm nocancer +
-239    norm nocancer +
-240    norm nocancer +
-241    norm nocancer +
-242    norm nocancer +
-243 mutated nocancer +
-244    norm nocancer +
-245    norm   cancer +
-246 mutated nocancer +
-247 mutated nocancer +
-248    norm nocancer +
-249    norm nocancer +
-250 mutated nocancer +
-251 mutated nocancer +
-252    norm nocancer +
-253    norm nocancer +
-254    norm nocancer +
-255    norm nocancer +
-256 mutated nocancer +
-257    norm nocancer +
-258 mutated nocancer +
-259    norm nocancer +
-260 mutated nocancer +
-261 mutated nocancer +
-262    norm nocancer +
-263    norm nocancer +
-264 mutated nocancer +
-265 mutated nocancer +
-266 mutated nocancer +
-267    norm   cancer +
-268    norm nocancer +
-269 mutated nocancer +
-270    norm nocancer +
-271    norm   cancer +
-272 mutated nocancer +
-273 mutated nocancer +
-274    norm nocancer +
-275 mutated nocancer +
-276    norm nocancer +
-277    norm nocancer +
-278    norm nocancer +
-279    norm nocancer +
-280    norm nocancer +
-281 mutated nocancer +
-282 mutated nocancer +
-283    norm nocancer +
-284 mutated   cancer +
-285    norm   cancer +
-286 mutated nocancer +
-287 mutated nocancer +
-288 mutated nocancer +
-289    norm nocancer +
-290 mutated nocancer +
-291    norm nocancer +
-292    norm nocancer +
-293 mutated nocancer +
-294    norm nocancer +
-295 mutated nocancer +
-296 mutated nocancer +
-297    norm nocancer +
-298 mutated nocancer +
-299 mutated nocancer +
-300    norm nocancer +
-301 mutated nocancer +
-302    norm nocancer +
-303    norm nocancer +
-304 mutated nocancer +
-305    norm nocancer +
-306 mutated nocancer +
-307 mutated nocancer +
-308 mutated nocancer +
-309    norm nocancer +
-310    norm nocancer +
-311    norm   cancer +
-312    norm nocancer +
-313 mutated nocancer +
-314    norm nocancer +
-315    norm nocancer +
-316    norm nocancer +
-317 mutated nocancer +
-318    norm nocancer +
-319 mutated nocancer +
-320    norm nocancer +
-321    norm nocancer +
-322    norm nocancer +
-323    norm nocancer +
-324    norm nocancer +
-325    norm   cancer +
-326 mutated nocancer +
-327    norm   cancer +
-328    norm nocancer +
-329 mutated nocancer +
-330 mutated nocancer +
-331    norm nocancer +
-332    norm nocancer +
-333 mutated nocancer +
-334    norm nocancer +
-335 mutated nocancer +
-336    norm nocancer +
-337    norm nocancer +
-338    norm nocancer +
-339    norm   cancer +
-340 mutated   cancer +
-341    norm nocancer +
-342    norm nocancer +
-343    norm nocancer +
-344    norm   cancer +
-345 mutated nocancer +
-346    norm nocancer +
-347 mutated nocancer +
-348 mutated nocancer +
-349    norm nocancer +
-350 mutated nocancer +
-> tab <- table(da)+
 > tab > tab
-         canc +        cancer 
-gene      cancer nocancer +gene     nocancer cancer 
-  mutated     10      119 +  mutant      121     19 
-  norm        23      198+  norm        210      
 +> tab[1,2] 
 +[1] 19 
 +> tab[1,1] 
 +[1] 121 
 +>  
 +> # p.c.m = p.cancer.mut the above 
 +> p.cancer.mutant <- tab[1,2]/(tab[1,1]+tab[1,2]) 
 +> p.nocancer.mutant <- tab[1,1]/(tab[1,1]+tab[1,2]) 
 +> p.cancer.mutant 
 +[1] 0.1357143 
 +> 1-p.cancer.mutant 
 +[1] 0.8642857 
 +> p.nocancer.mutant 
 +[1] 0.8642857 
 +>  
 +> p.cancer.norm <-  tab[2,2]/(tab[2,1]+tab[2,2]) 
 +> p.nocancer.norm <- 1-p.cancer.norm 
 +> p.cancer.norm 
 +[1] 0.02777778 
 +> p.nocancer.norm 
 +[1] 0.9722222 
 +>  
 +> odds(p.cancer.mutant) 
 +[1] 0.1570248 
 +> odds(p.cancer.norm) 
 +[1] 0.02857143 
 +> odds.ratio(p.cancer.mutant, p.cancer.norm) 
 +[1] 5.495868
  
 </code> </code>
Line 464: Line 207:
 여기서  여기서 
 \begin{align*} \begin{align*}
-y & = ln(x) \\ +ln(x) & = y  \\ 
-& = log_e {x} \\+log_e {x} & = y  \\
 x & = e^{y} \\ x & = e^{y} \\
 \end{align*} \end{align*}
Line 568: Line 311:
  
 </code> </code>
-====== Odds ratio in logistic ====== 
-\begin{align*} 
-ln(\frac{p}{1-p}) = & y \\ 
-\frac {p}{1-p} = & e^{y} \;\;\; \text{where } \;\; y = a + bX \\ 
-\text {odds} = & e^{y} = e^{a + bX} \\ 
-\text{then} \;\;\; \text{odds ratio} (y_{2}/y_{1}) = & \text {odds ratio between  } \\ 
-& \text{odds of y at one point, } y_1 \text { and } \\ 
-& \text{odds of y at another point, } y_2 \\ 
-\text{and  }  y_1 = & a + b (X) \\ 
-              y_2 = & a + b (X+1) \\ 
-\text{then  } & \;\; \\ 
-\text {odds of } y_1 = & e^{(a+b(X))} \\ 
-\text {odds of } y_2 = & e^{(a+b(X+1))} \\ 
-\text {odds ratio for } y_1 = & \frac {e^{(a+bX+b)} } {e^{(a+bX)}} \\ 
-= & \frac {e^{(a+bX)} * e^{b}} {e^{(a+bX)} } \\ 
-= & e^b 
-\end{align*} 
-  * 위의 $e^b$ 가 의미하는 것은 $X$가 한 유닛만큼 증가하면 $Y$는 $b$만큼 증가하는 것이 되는데 이 $b$는  
-  * $y2$와 $y1$ 간의 $\text{log of odds ratio}$ 로 이해되어야 한다. 따라서  
-  * y2와 y1 간의 $\text{odds ratio} = e^b $ 이 된다. 
  
 ====== Logitistic Regression Analysis ====== ====== Logitistic Regression Analysis ======
Line 682: Line 405:
  
 </code> </code>
 +
 +===== Odds ratio in logistic =====
 +\begin{align*}
 +ln(\frac{p}{1-p}) = & y \\
 +\frac {p}{1-p} = & e^{y} \;\;\; \text{where } \;\; y = a + bX \\
 +\text {odds} = & e^{y} = e^{a + bX} \\
 +\text{then} \;\;\; \text{odds ratio} (y_{2}/y_{1}) = & \text {odds ratio between  } \\
 +& \text{odds of y at one point, } y_1 \text { and } \\
 +& \text{odds of y at another point, } y_2 \\
 +\text{and  }  y_1 = & a + b (X) \\
 +              y_2 = & a + b (X+1) \\
 +\text{then  } & \;\; \\
 +\text {odds of } y_1 = & e^{(a+b(X))} \\
 +\text {odds of } y_2 = & e^{(a+b(X+1))} \\
 +\text {odds ratio for } y_1 = & \frac {e^{(a+bX+b)} } {e^{(a+bX)}} \\
 += & \frac {e^{(a+bX)} * e^{b}} {e^{(a+bX)} } \\
 += & e^b
 +\end{align*}
 +  * 위의 $e^b$ 가 의미하는 것은 $X$가 한 유닛만큼 증가하면 $Y$는 $b$만큼 증가하는 것이 되는데 이 $b$는 
 +  * $y2$와 $y1$ 간의 $\text{log of odds ratio}$ 로 이해되어야 한다. 따라서 
 +  * y2와 y1 간의 $\text{odds ratio} = e^b $ 이 된다.
  
 ===== coefficient (계수) 해석 ===== ===== coefficient (계수) 해석 =====
Line 702: Line 446:
     * 즉, $log(om/of) = b$     * 즉, $log(om/of) = b$
     * $log(1.444613) = b$     * $log(1.444613) = b$
 +    * $ 1.444613 = e^b$
 <code> <code>
 > log(1.444613) > log(1.444613)
Line 812: Line 557:
 </code> </code>
 마리화나의 사용경험에서 남성이 여성보다 큰 승산이 있다고 판단되었다 (Odds ratio (OR) = 1.44; 95% CI = 1.13, 1.86; p = .004). 남성은 여성보다 약 44% 더 사용경험을 할 승산을 보였다 (OR = 1.44).  마리화나의 사용경험에서 남성이 여성보다 큰 승산이 있다고 판단되었다 (Odds ratio (OR) = 1.44; 95% CI = 1.13, 1.86; p = .004). 남성은 여성보다 약 44% 더 사용경험을 할 승산을 보였다 (OR = 1.44). 
 +====== exercise: binary IV ======
  
 +<code>
 +########################################
 +# exercise
 +
 +head(df)
 +table(df)
 +# base 바꾸기
 +df.norm <- df %>% mutate(gene = relevel(gene, ref = "norm"))
 +df.mut <- df %>% mutate(gene = relevel(gene, ref = "mutant"))
 +
 +
 +logm.cancer.gene.1 <- glm(cancer ~ gene, family = binomial, data = df.norm)
 +summary(logm.cancer.gene.1)
 +a <- logm.cancer.gene.1$coefficients[1]
 +b <- logm.cancer.gene.1$coefficients[2]
 +a
 +b
 +a+b
 +# when b = 0; 즉, mutant = 0 일 때
 +# log(odds.norm) = a 이므로
 +# odds.norm = e^a
 +exp(a)
 +# 확인
 +odds(p.can.norm)
 +# odds.mut = e^(a+b)
 +exp(a+b)
 +odds(p.can.mut)
 +# odds.ratio = e^(b)
 +exp(b)
 +odds.ratio(p.can.mut, p.can.norm)
 +
 +
 +logm.cancer.gene.2 <- glm(cancer ~ gene, family = binomial, data = df.mut)
 +summary(logm.cancer.gene.2)
 +a <- logm.cancer.gene.2$coefficients[1]
 +b <- logm.cancer.gene.2$coefficients[2]
 +a
 +b
 +a+b
 +# when b = 0; 즉, mutant = 0 일 때
 +# log(odds.norm) = a 이므로
 +# odds.norm = e^a
 +exp(a)
 +# 확인
 +odds(p.can.mut)
 +# odds.mut = e^(a+b)
 +exp(a+b)
 +odds(p.can.norm)
 +# odds.ratio = e^(b)
 +exp(b)
 +odds.ratio(p.can.norm, p.can.mut)
 +
 +
 +</code>
 ====== X: numeric variable ====== ====== X: numeric variable ======
 <code> <code>
logistic_regression.1733698214.txt.gz · Last modified: 2024/12/09 07:50 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki