Skip to main content
added 571 characters in body
Source Link
Dave
  • 4.9k
  • 1
  • 10
  • 38

It’s a minimization problem. The typical calculus approach is to find where the derivative is zero and then argue for that to be a global minimum rather than a maximum, saddle point, or local minimum.

In a nice situation like linear regression with square loss (like ordinary least squares), the loss, as a function of the estimated parameters, is quadratic and up-opening. Thus, when we find a point with a derivative of zero, it is assured to be a global minimum.

Therefore, start taking the partial derivatives and finding where they equal zero.

EXAMPLE (SIMPLE LINEAR REGRESSION)

$$ \hat y=\hat\beta_0+\hat\beta_1x\\ L(y,\hat\beta_0,\hat\beta_1)=\sum_{i=1}^N\bigg( y_i - \hat\beta_0-\hat\beta_1x_i \bigg)^2 $$

Now solve as system of equations for the optimal $\hat\beta_0$ and $\hat\beta_1$.

$$ \dfrac{\partial L}{\partial \hat\beta_0}=0\\ \dfrac{\partial L}{\partial \hat\beta_1}=0 $$

There is a geometric argument for why the solution is a global minimum, but it might be worth doing once the entire second-derivative test from multivariable calculus, just to see how it all works.

It’s a minimization problem. The typical calculus approach is to find where the derivative is zero and then argue for that to be a global minimum rather than a maximum, saddle point, or local minimum.

In a nice situation like linear regression with square loss (like ordinary least squares), the loss, as a function of the estimated parameters, is quadratic and up-opening. Thus, when we find a point with a derivative of zero, it is assured to be a global minimum.

Therefore, start taking the partial derivatives and finding where they equal zero.

It’s a minimization problem. The typical calculus approach is to find where the derivative is zero and then argue for that to be a global minimum rather than a maximum, saddle point, or local minimum.

In a nice situation like linear regression with square loss (like ordinary least squares), the loss, as a function of the estimated parameters, is quadratic and up-opening. Thus, when we find a point with a derivative of zero, it is assured to be a global minimum.

Therefore, start taking the partial derivatives and finding where they equal zero.

EXAMPLE (SIMPLE LINEAR REGRESSION)

$$ \hat y=\hat\beta_0+\hat\beta_1x\\ L(y,\hat\beta_0,\hat\beta_1)=\sum_{i=1}^N\bigg( y_i - \hat\beta_0-\hat\beta_1x_i \bigg)^2 $$

Now solve as system of equations for the optimal $\hat\beta_0$ and $\hat\beta_1$.

$$ \dfrac{\partial L}{\partial \hat\beta_0}=0\\ \dfrac{\partial L}{\partial \hat\beta_1}=0 $$

There is a geometric argument for why the solution is a global minimum, but it might be worth doing once the entire second-derivative test from multivariable calculus, just to see how it all works.

Source Link
Dave
  • 4.9k
  • 1
  • 10
  • 38

It’s a minimization problem. The typical calculus approach is to find where the derivative is zero and then argue for that to be a global minimum rather than a maximum, saddle point, or local minimum.

In a nice situation like linear regression with square loss (like ordinary least squares), the loss, as a function of the estimated parameters, is quadratic and up-opening. Thus, when we find a point with a derivative of zero, it is assured to be a global minimum.

Therefore, start taking the partial derivatives and finding where they equal zero.