Differences

This shows you the differences between two versions of the page.

--- gradient_descent [2025/08/21 12:18] – [R output] hkimscil
+++ gradient_descent [2025/10/02 11:59] (current) – hkimscil
@@ Line 1: / Line 1: @@
 ====== Gradient Descent ======
-====== explanation ======
-====== Why normalize (scale or make z-score) xi ======
-x 변인의 측정단위로 인해서 b 값이 결정되게 되는데 이 때의 b값은 상당하고 다양한 범위를 가질 수 있다. 가령 월 수입이 (인컴) X 라고 한다면 우리가 추정해야 (추적해야) 할 b값은 수백만이 될 수도 있다.이 값을 gradient로 추적하게 된다면 너무도 많은 iteration을 거쳐야 할 수 있다. 변인이 바뀌면 이 b의 추적범위도 드라마틱하게 바뀌게 된다. 이를 표준화한 x 점수를 사용하게 된다면 일정한 learning rate와 iteration만으로도 정확한 a와 b를 추적할 수 있게 된다.
-====== How to unnormalize (unscale) a and b ======
-\begin{eqnarray*}
-y & = & a + b * x \\
-& & \text{we use z instead of x} \\
-& & \text{and } \\
-& & z = \frac{(x - \mu)}{\sigma} \\
-& & \text{suppose that the result after calculation be } \\
-y & = & k + m * z \\
-& = & k + m * \frac{(x - \mu)}{\sigma} \\
-& = & k + \frac{m * x}{\sigma} - \frac{m * \mu}{\sigma}  \\
-& = & k - \frac{m * \mu}{\sigma} + \frac{m * x}{\sigma}  \\
-& = & k - \frac{\mu}{\sigma} * m + \frac{m}{\sigma} * x \\
-& & \text{therefore, a and be that we try to get are } \\
-a & = & k - \frac{\mu}{\sigma} * m \\
-b & = & \frac{m}{\sigma} \\
-\end{eqnarray*}
 ====== R code: Idea ======
 <code>
+library(tidyverse)
+library(data.table)
 library(ggplot2)
 library(ggpmisc)
@@ Line 519: / Line 497: @@
 >
 </code>
-렇게 말고 구할 수 있는 방법은 없을까?
+a와 b를 동시에 구할 수 있는 방법은 없을까? 위의 방법으로는 어렵다. 일반적으로 우리는 a와 b값이 무엇이되는가를 미분을 이용해서 구할 수 있었다. R에서 미분의 해를 구하기 보다는 해에 접근하도록 하는 프로그래밍을 써서 a와 b의 근사값을 구한다. 이것을 gradient descent라고 부른다.
-gradient descent
 ====== Gradient descend ======
@@ Line 563: / Line 540: @@
 & = & -2 X_i \sum{(Y_i - (a + bX_i))} \\
 & = & -2 * X_i * \sum{\text{residual}} \\
-\\
+& .. & -2 * X_i * \frac{\sum{\text{residual}}}{n} \\
+& = & -2 * \overline{X_i * \text{residual}} \\
 \end{eqnarray*}
-(미분을 이해한다는 것을 전제로) 위의 식은 b값이 변할 때 msr (mean square residual) 값이 어떻게 변하는가를 알려주는 것이다. 그리고 그것은 b값에 대한 residual의 총합에 (-2/N)*X값을 곱한 값이다.
+위의 설명은 Sum of Square값을 미분하는 것을 전제로 하였지만, Mean Square 값을 (Sum of Square값을 N으로 나눈 것) 대용해서 이해할 수도 있다. 아래의 code는 (미분을 이해한다는 것을 전제로) b값과 a값이 변할 때 msr (mean square residual) 값이 어떻게 변하는가를 알려주는 것이다.
 <code>
@@ Line 1011: / Line 990: @@
 >
 </code>
+{{:pasted:20250821-121910.png}}
+{{:pasted:20250821-121924.png}}
+{{:pasted:20250821-121943.png}}
+====== Why normalize (scale or make z-score) xi ======
+x 변인의 측정단위로 인해서 b 값이 결정되게 되는데 이 때의 b값은 상당하고 다양한 범위를 가질 수 있다. 가령 월 수입이 (인컴) X 라고 한다면 우리가 추정해야 (추적해야) 할 b값은 수백만이 될 수도 있다.이 값을 gradient로 추적하게 된다면 너무도 많은 iteration을 거쳐야 할 수 있다. 변인이 바뀌면 이 b의 추적범위도 드라마틱하게 바뀌게 된다. 이를 표준화한 x 점수를 사용하게 된다면 일정한 learning rate와 iteration만으로도 정확한 a와 b를 추적할 수 있게 된다.
+====== How to unnormalize (unscale) a and b ======
+\begin{eqnarray*}
+y & = & a + b * x \\
+& & \text{we use z instead of x} \\
+& & \text{and } \\
+& & z = \frac{(x - \mu)}{\sigma} \\
+& & \text{suppose that the result after calculation be } \\
+y & = & k + m * z \\
+& = & k + m * \frac{(x - \mu)}{\sigma} \\
+& = & k + \frac{m * x}{\sigma} - \frac{m * \mu}{\sigma}  \\
+& = & k - \frac{m * \mu}{\sigma} + \frac{m * x}{\sigma}  \\
+& = & \underbrace{k - \frac{\mu}{\sigma} * m}_\text{ 1 } + \underbrace{\frac{m}{\sigma}}_\text{ 2 } * x \\
+& & \text{therefore, a and be that we try to get are } \\
+a & = & k - \frac{\mu}{\sigma} * m \\
+b & = & \frac{m}{\sigma} \\
+\end{eqnarray*}
-{{:pasted:20250801-185727.png}}